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Instruction Set 
Vnsemble Kxtract Immediate 



E-MULXJ.32.N 


Imtrrtjte irut^ty extran xnmediare sgned quadtets neurit 


E.MULXI.32.Z 


1 memo* mMptf attract nrndtM »gned quadfcO mo 


E.MULXI.64.C 


Ensemble rrUtiftf edran •nmedMtr sgned ortets rafcng 


E.MULXI.64F 


[ntmctt mJttpfy «tr«t nwrtw «gned ortets kx» 


E.MULX...64.N 


Ensemttfe rrxJtipry extract mwd* »gned octets nwrt 


E.MULX.I.64.Z 


Ensemble rruAply extran mnedMte «gned ortets mo f 


E.MULX..C.8C 


Ensemble mtJCiptv ertrart mw*« complex bytn ce*ng 


E^ULXI.C.8.F 


EnNMUt mHbp*y «art xnmedute cornp*rx bytn floor j 


E.MULX.I.C.8.N 


Ensemble nx*ply e< tract vrvnedMte complex byt« nearnt 


E.MULX.I.C.8.Z 


[raontur rrxit**y extran ervnedute complex bytn mo 


E.MULXI.C.16.C 


Enwrrctt rr****y extran immediate catnpiex doublets ce*ng J 


E.MULX.I.C.I6.F 


frnemMe mt<tip*y exfan enmedute complex doUtlets floor 


E.MULXI.CI6.N 


Ensemble nx«**y extran vnmediate vomplex doublets fw»« 


E.MULX.I.C.16.2 


Ensemble rrxjtipfy extran xnmedwite romptri doublets mo 


E.MULX..C.32.C 


Ensemble manply extran mmrdwtf cor.iplex quadtets ce*ng 


E.MULX.I.C.32.F 


Ensemble rrajtiply extran immediate complex quartets floor 


EmiLX.IC.32N 


Emembte rrxjuply extract xnmedwte complex quacfets nearest 


E.MULXI.C.32.Z 


Ensemble rrxJbply extract xnmedwte complex quadiets mo j 


E.MULXI.C.64.C 


Ensemble mJOfty extran immedtsfe complex ortets ce*ng 


E.MULX...C64.F 


Ensemble mUupV extran xnmedwte comptei ocoets floor 


E.MULX.I.C.64.N 


Err prow mjtoprf extract imme»:iate complex ocoets nearest 


E.MULXJ.C.64.Z 


tnsembie noJtifty extract xnmedwte complex ortets mo 


E.MULXIM8.C 


(merrtttr mUtipty extract xnmedwte mrxed-«aned bytn ~e*ng 


E.MULX.I.M.8.F 


Ensemble mLA**y extran xnmedwte -rxxed-iigned bytn floor 


E.MULXI.M.8.N 


Ensemble minpiy extraa immedwte mmediigned bv»n nearnt 


E.MULX.I.M.8.Z 


Ir, vnrtxe mMifty extract xnmedwte mutedwyned bytn mo 


E-MULX.I.M 16.C 


EnserrtHe :nM<(*f extract xnmedwte mwed-vgned doublets ceifcng 


E.MULj(.I.M.I6.F 


f fis*rrtne rrxm>f*y extra*: mmectwu rrxxed-»gned doublets floor 


E.r\ntlL-Xl.iVi.?6.N 


I memo* m^t^Ty extran xr>nedwte rrxxed vgned doublets nearnt 


E.MULX.I.M.I6.Z 


Ense.;J>* rrxjuply extran immedwte mixed ugned doublets mo 


E.MULXI.M.32.C 


Ensentole n.Jttpf exoan immediate rrxxedugned quadlea ce*nq 


E.MULXI.M.32.F 


Ensemble nonpfy extran immediate rrxxed ngned quadlets floor 


E.MULXI.M.32.N 


fmerrate muttifxy extran immediaie mxed-vqned t^jadlets nearest 


E.MUL X.I.M.32.Z 


InvrrtXe muxipry extran xnmeitofe mixed tiqned <fj.- *ea mo 


E.MULXI.M.64.C 


Ensemble mJtiftf extran xrtmedwwe rrxxed wgned ortrti rnknq 


E.MULXJ.M.64F 


Irnerrttte mM^ty .-r^an vnmedMte mixed-«gr>ed ortets floor 


E.MUL-XI.M.64.N 


Ensemble mumpt, extran «, mediate mxed wqned tJrtets nearest 


E.MULX.I.M.64 Z 


EmerrMe '.^WtifXy extra»l mr >Jtate muieo wgned orttet? zero 


E.MULX.I.U.8.C 


Ensemble rnjftc*/ extran »T»nediate unsigned bytn rnanq 


E.MUL-XI.U.8.F 


Enserrtjle rrxJtipry extran mxnedwle unsigned bytn floor 


E.MULX.I.U.8.N 


fnsembte multiply eKtrart xnmedMte unwgned bytn nearnt 


E.MULX.I.U.16.C 


r ., . _ i -■_ a,, J; , Mwtv^rt will ^ P itfKJrWVt ftf jfllfft fMllfWl 


E.MULX..U.16.F 


(mrnt^ mtJTipfy miTr • mmedMfe irmgnrrf «iouh»m ftoor 


E.MULX.I.U.I6N 


f wmWr mu*np(y «itfe*rt imrncdi^r umignert douMm ntMf«l 


E.MULXIU32C 


(rnmMr muttipey mfr^T immeowir unvgnerl quMim r«*ng 


E.MULX.I.U.32.F 


fmffnWf mutf^ r%tr*rt wrvnnhMe unvgned qurVJrft floor 
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£rn*mbte m/opry rx'act mmedate 


unsomd <iiadfeti near eff 


E.MULX.I.U.64.C 


Freer?** jrtJbpry ctfrart mmwto 




E.MULXI.U.64.F 


ErwcmftX mMOfty extract rnrneduce 


unstGjned orttets floor 


E.MULXI.U.64.N 


EnwrnMr muopry e*trart "nmedute 


urrv^wd ocOets nearest 



Format 

E.op.size.rnd rd=rcrt>.i 
rd=eopsizerndjrc,rb,i) 

3) 24 23 18 17 12 11 6 S 4 3 2 1 0 

I E.op I rd 1 rc | rb fsz \rn4 sn | 

8 6 6 6 2 2 2 

sz <- log(size) - 3 
case op of 

E.EXTRACT.I. E.EXTRACTI.U, E.MULX.I, E.MULXI.U E.MULXJ.M: 

assert size £ i £ size-3 

sh 4- size - i 
E.MULXI.C: 

assert size*! £ i £ size- 2 

sh 4- size ♦ 1 • i 

endcase 

The contents of registers rc and rb are partitioned into groups of operands of the size 
specified and multiplied, added or subtracted, or are catenated and partitioned into operands 
of twice the sr/c specified. The group «»f values is rounded, and limited as specified, yielding 
a group of results, each of which is the si/e specified. The group of results is catenated and 
placet! in register rd. 

Tor mixed signed multiplies, the contents of register rc is signed, and the contents of register 
rb n unsigned The extraction operation and the result of mixed -signed multiplies is signed. 

/. (zero) rounding is not defined tor unsigned extract operations, and a Rescrvedlnstrucnon 
exception is raised if attempted. F (floor) rounding will properly round unsigned results 
d< iwnward. 
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An ensemble multiplv extract immediate doublets instruction (E.MUKX1.I6 or 
F..Ml : I-X.M'.!6) multiplies operand |h g f e d c b a) by operand |n o n m 1 k j i), yielding the 
products |hp $l> fn cm dl ck bj ai|, rounded and limited as specific*!: 

[ h I g t f I • I d I cTbTTI 




| g | go I fti | *m | dl | cw 1 bj j «Tj 

Ensemble multiply extract immediate doublets 



Another illustration of ensemble multiply extract immediate doublets instruction 
(F.MI-I.X.I.I6 or E.MUL.X.I.U.16): 




rb|128) 



rd|12S) 

Ensemble multiply extract immediate doublets 
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An ensemble multiply extra a immediate complex doubkts ; nsmiction (RMUIJCJC.16 or 
E.MtiLX.I.U.lf) multiplies operand [h g f e d c b aj by opennd |p o n m I k j i], yvJding the 
result |#>+ho go-hp cn+fm cm-fh d+dk ck-dl aj+bi ai-bj], rounded and limited as specified. 
Note that this instruction prefers an organization of complex numbers in which the real part 
is located to the right (lower precision) of the imaginary part.: 



Ihlol f |«hl clbj.l 













1 F 




w J 


IVJii i 
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EnzecntXe multiply extract immediate complex doublets 



Another lllusrratit »n of ensemble multiply extract immediate complex doublet* instruction 
(F..MU.X.I.G16 or E MI LX.I.LM6) 




12J 



rb|12$) 



rd|12t) 

Ensemble multiply extract immediate complex doublets 
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def mul(g/eAvivAws,wJ as 
enddcf 

def Er*semb<e£xtraetfnyned*ate|opirc 
c «- RegReadfrc 128} 
b RegReadfrt* 128} 
c*»se op of 

E.EXTRACTJ. E.MULX.I. f MULXI.C: 
as 4- i 
cs «- I 
bs 1 
E.MULXI.M: 
as 4- I 
cs +- 0 
bs I 
E.EXTRACTJ.U. E.MULXI.U: 
as «- 1 
cs «- 0 
bs 4- 0 

if md s Z then 

raise Reservedlnstruction 

end* 

endcase 
case op o* 

E.EXTRACTJ. E.EXTRACT.I U. E.MULXI. E.MULXI.U. E MULXI.M: 

h 2*size 
E.MULXI.C: 

h *- (2*size| ♦ I 

endcase 

r <- h • we - sh 
for i «- 0 to 128-size by size 
case op of 

E.EXTRACTJ. E.EXTRACT I.U: 

P *- |C M b|2isire^)-l 2 # i 
E.MULXJ. E.MULXJM E.MULX I.U: 

p mulfsjzeAcs.ct.bSwb.i| 
E .MULXI.C: 

if i & size « 0 then 

p 4- mul(size.h.cs.c.i.bs.b.ij - mul|sa*eAcs.c.i*si*e.bs.rj.i«we| 

Hse 

p «- mul(sizeAcSwC.i.bs.b.KSizc| ♦ mulfy*eAcs.c.i.bsA»*si/e| 

end* 

endcase 
case rod of 
none. N: 

5 4- ON | i .p, | | pT-l 

Z: 

S <- 0^ II p^i 

F: 

$ 4- 0* 

C 



249 



Micn»l : nity 



Zeus System Architecture 



Tuc t Aujtt7, 1999 



Instruction Set 
fjitttnhir Hsmci \mmtxk*r< 



$ 4- 0^ M ! r 

enctease 

v - ((as & pr^iP tPl ♦ fOI 14 

* tVn r^ « I* * ^i^ 1 ^ then 

efce 
endfof 

keqWrnetrd. 128. a) 
enddef 

Exceptions 

Reserved I n«rrucnon 
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Ensemble Extract Immediate Inolace 

These operations take operands from two registers and a short immediate value, perform 
operations on partitions of bits in the operands, and plate the catenated results in a third 
register. 

Operation codes 



E.MULADDXI.C.8.C 


Efwmttf muftpry adc 


e«vi «/nmr<Mr ugnrd rornptri oytrt cc*ng 


E.MULADDXI.C.8.F 


tTMTtrwjip muiip^ aoo 


cirart mmrdwtc ugnrd rornptri oym floor 


E MULADOXI.C.8.N 


EmenitMP rnuftpiy adc 


rdrart mnwMf ugnrd rompirs Dyttn neMrxt 


E.MULADDXI.C.8.Z 




cur act iw<Mr ugnrd rorf^pjri Dytn ftro 


E.MUlADOXI.C. 1 6.C 


EnvmMP mufcqpfy add CMffact tfnmrOMff ugnrd corr^pJra doirf*m c ring 


E.MULADDXI.C. 1 6.F 


tmetnttc rnu^pty add 


rwart wnedUNe n^nrd con>p<ra dotMcrt floor 


E.MULADDXI.C 1 6.N 


Eme*** muftftfy Mid 


f- kb act rnnihMr ugned con^pirx dtoUbtrti nr^rcv 


E.MULADDXI.C.16Z 


Erermov rmatytfy add 


etfact jiMmdtfM uqned co/npjrt doubicti mo 


E.MULADDXI.C.32.C 


tnnfflBr munpry aoa 


ettract #nmrov<r ygnrd co*np^"« duadh'fi reAng . 


E.MULADDXI.C.32.F 


ErnrrnbJe muftpiy add 


eidfact fvmedhaie tiQned comptcn {^.*MMct\ floor 


E.MULADDXI.C. 32.N 


ErorrnMr vnufepiy add 


mran #nmrdMtr ygrwd ron^pira o>jadm nrarrtf 


E.MULADDXI.C32.Z 


EmrmMe o***jiy add 


rtfrart jnmeoVtr ^nrd ron^m quadam mo 


E.MULADDXI.C.64.C 


Emrmttr rrm*«tfy add 


curat! ffTvnrdMcr ugnrd rompirt or dm rr^ng 


EJWHJLADDXI.C.64J 


frarmcir mufepfy add 


rjtfrart snmrdbarr vgnrd comptri or dm floor 


E.MULADDXI.C. 64.N 


Enwnttr mufepfy add 


CJtracT fnmPdMtr ugnrd corrtpjix orOrft nrarrw 


E.MULADDXI.C.64Z 


fr-nv*mocc iwuftpiy add 


ndrar* tfvmrdiair ugnrd roff^piri orncfi irro 


E.MULADDXI.M.8.C 


fcnvrnttr muttftfy add rtfrart mmrcMcr murd ugnrd bytn rrdng 


E.MULADDXI.M.8.F 


EmcwiDic mLrfbpfy add 


rxtrarT r^mrdMtr mrard ugnrd oytrt flooi 


E.MULADDXI.M.8.N 


EmemMt add 


catrarr rtnrdwtt rmard ugnrd &i^r% nrarrtf 


E.MULADOXIM8.Z 


(mrmttr mufepfy add 


cuTjart mi7*r dMfr rmird ugnrd bym mo 


E.MULADDXI.M. I6.C 


frrarmoir rr*jftf)fy add 


crtrarf mmrcjiatr mtard ugnrd df**tfrh rolnj 


E.MULADOXI.M.I6.F 


tmrrvtfur n*4bpry add 


rstrarr • tv wtlk <fr nufO ugnrd douDirt^ floor 


E.MULADDXIM.I6.N 


(mrmttr rnuApy add rxtrart mmrduat* mutrd ugnrd Oouttn nrarrtf 


E.MULADDXI.M.16.Z 


Emrrrtfaf rmMffy MSA 


rajjart f^nwrdMtc ffu«rd ugnrd douMrtA /rro 


E.MULADDXI.M.32.C 


fnwnttr ff^J^rfy add 


rurart mmrvMf m»«rd ugnrd t^»oVn rrtftng 


E.MULADDXI.M.32.F 


f rHmtotir muApfy add 


rtfrart mmrd^ mtm <% ugnrd quadh~n floor 


E.MULADDXI.M.32.N 


Fmrmnir rr*j*4*y add 


rifrarf vrmrdiatr muvd vgnrd quadk*n nr*wrv 


E.MULADDXI.M.32Z 


tmrmoir **aTf*y add ratrarf rrtrmUMr murd v/nrd Qj«arSrn mo 


EMUL^DDXI M.64.C 


trvmwr muft«p*y add r«ttart wnrtWr rmard ugn*M orom r***ng 


E.MULADDXI.M.64.F 


(ntffnMr *^<ftyify add ritFarT flwwdkW nuBrd ugnrd or tirft flonf 


E.MULADOXI.M.64.N 


tmrmoir muft«pfy add rsfrarr writer f*w»rd ugrcd nr«>m or^rru 


E MULUDDXI.M 64.Z 


fnvmttr mutftfy add 


rtfrart nrk cMr nuard ugnrd or am mo 


E.MULADDXI.8.C 


Munttr n*jflvp#y add 


rm*et mmrAMr ugnrd Oytm rr*ng 


E MULADDXI.8.F 


^fwnw iwuttpfr add 


ritiMl mmrrtfcwr ugnrd oytm floor 


E.MULADDXI.8 N 


fnvnMf muft^pfy add rttrart mmrrikafr ugnrd oytr* ncMnf 


E.MUL-ADDXI.8.Z 


fnvmMr mjupty «M rflfrart mmrffcMr wjnrd ryrm mo 


EMULADDXI I6.C 


tmrrmr nujttpry add 


rsfrarf wrtrdnT ugnrd douMrft rnirvj 


EMULADDXI. I6.F 


^nwrMr n^jtf^Xy add 


rtftArt ffwnrdhafr ugnrd <Jnu04rt\ floor 



2SI 



Micn»t*nit>- 



Zeus System Architecture 



Tuc t Aur 17. J" 9 



Instruction Set 



Hjtwmbte Kittacf 



E.MULADDXI.I6.IM 




Add m jrf «m* 


NMr ngnrd doublets rmrm 


E.MULADOXI. 1 62 


fmc* 


«Olr ***** 


Add n* Art mm 


■Amp ttgnrd rtraaHfO jwo 


E.MULAOOXI.32.C 




LMULADOXI.32.F 


fmcr 




add eflrart mm 




E.MULADOXI.32.N 


■ess 




add r*rart mwm 




E-MULADDXI.32.Z 


Ervtcv 


uttr iM^y 


add carart nvw 


*dWflr vgrwd ouAdtro wo 


E.MULADDXI.64.C 


fntrw 


tt* f*JOPV 


add nerjet 9W«ir 


'OUflr ogrwd ordrti rr*ng 


E.MULADDXI.64.F 


Erarwdir f*unpfy add *cf flrannflatr vyvtf uede© floor 


E.MULADOXI.64 N 


fmrti 


ttftr ****** 


add n^arr mw 


tMr ugnrd xm neamt 


E MULADDXI 64.Z 




•€*r rmabpjy 


add rmmt rnnc 


*duar vgnrd ororti /rro 


E.MULADOXI.U.8 C 




**r fftJopJy 


Add wart m 


{flair tawgnrd oytn coring 


E.MULADDXI.U.8.F 




** «*Jopjy 


add naran rwv 


<flatr tavugrwd Byvn floor 


E.MULADDX.I.UB.N 


Eramttr ****** 


Add MArt mmr 


<Mr taiugnrd Oy«n nrarctf 


E.MULADOXI.U 16 C 




M ***** 


Add cmcr mvne 


outr tmjgnrd douara erang 


E.MULADOXI.U. 1 6.F 


ffWfndr muftpiy MM cwjrf nvnciMP unwjncd doufitcts floor 


EMULADDXI.UI6.N 




*v rmJOory 


Add Ml Aft ffWW 


d«r i#monrd douotrti nrarr^r 


E.MULADOXI.U.32.C 


f owtvMr rmaapjy Add mran nimiKi cmgnrd au*d*n tomng 


E.MULADOXI.U 32.F 


f nipw 


Or r«t^ 


Add ettijrt iww 


dar lamgned quadMs floor 


E. MULADOXI. U.3 2 N 


fnw 


or "Ubpry 


Add rd»-r m 


<%j*r tAvegncd quadM nem 


E.MULADOXI.U 64. C 




«r ***** 


add ndr jrt mnr 


ttacr tawgnrd nrdrtj crang 


E.MULADOXI.U.64 F 


f mew 


Mr *ftJbpjy 


Add nflrarf iwir 


oVjtc •arygjrvd orflrn floor 


E.MULADOXI.U 64.N 




10* nuAt** 


add rtfrvt r im 


diiWr trmgnrd orders nearest 



Format 

E.op.size.rnd rd©rc.rb.i 

rd=eopstferr»d|rd.rc.rb.i) 

31 24 73 

I Eop 1 



rd 



18 17 

zn 



rc 



12 II 



rb 



6 5 4 3 2 I 0 
2 2 



sz <- log(size) - 3 
case co of 

E.MULAOOXI: 

$h <- size - i - I 
E.MULADDXI.U. E.MUL>\DDXI.M. E.MULADDXI.C: 
$h «- size - i 

endcase 



Pcxripfon 

'ITic contents of registers rc and rb are parnnoncd into groups of operands of the size 
specified and multiplied, added or subtracted, or are catenated and partinoncd into operands 
of twice the si/c specified. The contents of register rd are j. #ko*K>ned into groups of 
operands of the size specified and sign or *cn» ensemble and sh!%":cu as ^'.-citicd then added 
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to the group of values computed The group of values is rounded, and limited as specified, 
yielding a group of results which is the sue specified. The group of results is catenated and 
placed in register rd. 

Fur mixed-signed multiplies, the contents of register rc ts signed, and the contents of register 
rfe as unsigned. The extraction operation, the contents of register rd, and the result of mixed 
signed multiplies are signed. 

Z (zero) rounding is not defined ft* unsigned extract operations, and a Reserved Instruction 
exception ts raised if attempted. F (floor) rounding will properly round unsigned results 
downward 

An ensemble multiply add extract immediate doublets instruction (E.MIT-ADD.X.I.I6 or 
E.MULADD.X.I.r.16) multiplies operand (hgfcdcba)by operand |p o n m I k j i|, then 
adding |x w v u t s r q|. yielding the pmducts |hp+x go+w fh+v cm+u dl+t ck+s bj+r ai+qj, 
nxmded and limited as specified: 

I H| 9 | f | • I d | c | b | a I 




Ensemble multiply add extract immediate doublets 



253 



Micn»l ! mty 



Zeus System Architecture Tue, Aug 17, 1999 Instruction Set 

E— nth Enact Immrifaw tophce 

Another illustration of ensemble multiply sod extract immediate doublets instruction 
(F.MULADDX1.I6 or R.MULADD.XJ.U.16): 




Ensemble murtipry add extract immediate doublets 
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An ensemble multiply add extract immediate complex doublets instruction 
(RMLLADDX1C16 or GAfULADD.X.I.U.16) multiplies operand |h g f c d c b a) by 
upcrand|ponmlkjq,thcnadding |x w v u t s r oj, yielding the result fcp+ho+x go-hp+w 
en+fcn+r cm-fh+u d+dk+t ck-dl+s aj+bi+r ai-bj+q), rounded and limited as specified. 
Note that this instruction prefers an organization of complex numbers in which the real part 
is boned to the tight (lover precision) of the imaginary part-' 
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EnscmWe mufbpfy add extract immediate complex doublets 



t> *eg**ASfrt>. 
op or 

f MJLADDJtl. £MUIAX)XIC 

<fs I 

c$ I 

bt 9 
€ MULAODX I M 

0% - I 

cs o 

bt <- l 

ds 0 
cs #- 0 
fe 0 

* md * z then 

rsn * *«ervrdJnsmjct>on 

endif 

endcase 
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case op of 

E.MULADOXl EAWLAOOXI U. E.MULAOOXI.M: 

h *- 2 # we ♦ I 
EJ4ULADOX1.C: 

h - |2*si*eJ ♦ 2 



r h • ue • sh • I • jes and 64 
far • «- 0 to 128 by we 

O «- ||ds and dU« w .|| h -«^l IK^^i jl lO) 

case op of 

£ .MULMX>Xl £ MUL>DDXIM. EJUULADOXI.U: 

p «- muJfsueAcLc.tbitx4 ♦ <* 
EMULADOXI.C: 

# i & sue « 0 then 

p mul|sae.h.citil>LM - mul|si*e.rus.c.»Stfe.bs,b.»si*e| ♦ di 

else 

p «- multsireAcic.ibs.b.»»Stfe| ♦ mulfure.h.cic.i.bib.H n/e) ♦ di 

end* 

endcase 
case rod of 
none. N: 

n-flNii^iKfl 

Z 

0** II Pk.| 

F 

s «- 0» 

C 

s «- 0^ 1 1 1 r 

endcase 

v - ||ds L Ptvilllpl ♦ |0l ls( 
' Nrvr<*w* ■ Ids & «/fwe-ll f * , " f * w * lhen 
^leN i *~ v nre-l*f r 

efce 

«wm, . - ds ? |Vh II - vf}*' 1 ! I«* 

end* 
endtor 

ffegU/ntcfrd. I2C. a| 
enddef 

Rnen cdln%trurMi 



/ 
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E nsemble Floating-point 

These operations take two values from registers, perform a group of floating-point 
arithmetic operations on partitions of bits in the operands, and place the catenated results in 
a register. 



Operation coda 



EADDFA6 


Ensemble add floating-pent half 


EADDFA6C 


Ensemfcrfe add floating-point half ceting 


E^DOF 16 F 


Ensemble add floating-point haft floor 


EADDF.16N 


Ensemble add floating-point half nearest 


EADDF 16-X 


Ensemble add floating-point half exaa 


E-ADO.F 16 Z 


Ensemble add floating-pant hart zero 


E^D0.FJ2 


Ensemble add floating-point single 


E*DOF32C 


Ensemble add floating-point single ceiling 


E^DOF32.F 


Emerge add floating-point single floor | 


EADD F 32 N 


Ensemble add floating-point single nearest 


EADDF32X 


Emewbk add floating-point single exaa 


EADDF322 


Ensemble *»dd floating point single zero 


EADD f 64 


Ensemble add floating-point double 


E-ADO.F.64C 


Ensemble add floating-point double ceiling 


E^DDF.64.F 


Ensemble add floating point double floor 


E-ADDF 64 N 


Ensemble add floating-point double neatest 


E-ADOF64-X 


Ensemble add floating-point double exact 


E.ADOF64Z 


Ensemble add floating-point double zero 


E-ADO.F. 128 


Ement^e add floating-point quad 


E.ADD F I28 C 


Ensemble add floatingpoint quad ceiling 


EADD.F 1 28.F 


Ensemble add floating-point quad floor 


EADOF J28 N 


Ensemble add floating-point quad nearest 


E>*DDF 128* 


Ensemble add floating point quad exaa 


E^DD F I28.Z 


Ensemble add floating-point quad zero 


E.OJVF 16 


Ensemble divide floating-point half 


E.DTVF 16 C 


Ensemble divide floating-point half cetfcng 


E.DA/F 16 F 


Ensemble divide floating-point half floor ~~] 


EDfVF I6.N 


Ensemble divide floating-point half nearest ~ 


E DA/ F. 16-X 


Ensemble divide floating-point half exaa 1 


EDIV.F 16 2 


Ensemble divide floating-point tiart zero 


EOJVF.32 


Ensemble divide floating-point single 


E.CXV.F 32.C 


Ensemble divide flottmg-pomt single ceifeng 


EDIVF32.F 


Enseotjle dMde floating-point single floor | 


EDIV.F32N 


Ensemble divide floating-point single neatest 


EDIV.F32X 


Ensemble divide floating-point single exaa 


EDNF322 


Ensemble divide floating-point single zero 


E.OIVF64 


Ensemble divide floating-point double j 


E.WVF64C 


Ensemble divide floating point double cetbng j 
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E.DIVF.64.F 


Ensemble divide noaong- point aouDte noor 


E.CW.F.64.N 


Ensemble drvKle noaung- point aouoie nearest 


E.CW.F.64.X 


Ensemble divide noanng-poini aouoie exact 


E.CW.F.64.Z 


Ensemble divide noaong-pcini aouoie zero^ 


E.WVF.I28 


Ensemble divide floating-point quad 


E.£W.F.I28.C 


Ensemble divide floating-point quae ceiling 


E.DIV.F.I28.F 


Ensemble divide floating-point quad noor 


EIW.F.I28.N 


Ensemble divide floating-point quad nearest 


E.CW.F.128.X 


Ensemble divide floating-point quad exact 


E.CW.F.128Z 


Ensemble divide floating-point quad zero 


E.MUL.C.F.I6 


Ensemble multiply complex floating-point half 


E.MULCF.32 


Ensemye multiply complex floating-point single 


EMULCF64 


Ensemble multiply complex floating-point double 


E.MULF. 1 6 


Ensemble multiply floating-point half 


E.MULF. 1 6.C 


Ensemble multiply floating-point half ceiling 


E.MULF. 16.F 


Ensemble multiply floating-point half floor 


E.MULF. 1 6.N 


Ensemble multiply floating -po*nt half nearest 


EMULF. 16.X 


Ensemble multiply floating-point half exact 


E.MULF. 1 6.Z 


Ensemble multiply floating-point half zero 


E.MULF.32 


Ensemble multiply floating-point single 


E.MULF.32.C 


Ensemble multiply floating-point single ceiling 


E.MULF.32.F 


Ensemble multiply floating-point single floor 


E.MULF.32.N 


Ensemble multiply floating-point single nearest 


E.MULF.32.X 


Ensemble multiply floating-point single exact 


E.MULF.32.Z 


Ensemble multiply floating-point single zero 


E.MULF.64 


Ensemble multiply floating-point double 


EMULF.64.C 


Ensemble multiply floating-point double ceiling 


E.MULF 64.F 


Ensemble multiply floating-point douore noor 


E.MULF.64. N 


Ensemble multiply floating-point aouwe nearest 


E.MULF 64 X 


Ensemble multiply floating-point double exact 


E.MULF 64.Z 


Ensemble multiply floating-point double zero 


EMULF 128 


Ensemble multiply floating-point quad 


E.MULF. 128.C 


Ensemble multiply floating-point quad ceiling 


fc.MULr. iZa.r 


PntrmhJp multiDfv floaona-ooint Quad floor 


E.MULF. I28.N 


Ensemble multipry floating-point quad nearest 


E.MULF. 128.X 


Ensemble multiply floating-point quad exact 


E.MULF. 128.Z 


Ensemble multiply floating-point quad zero 



class 


op 


prcc 


round/trap 


add 


EADOF 


16 32 64 V28 


noni C F N X Z 


divide 


EDIVF 


16 32 64 128 


noni C F N X Z 


multipry 


EMULF 


16 32 64 128 


nomCFNXZ 


complex multiply 


EMULCF 


16 32 64 


NONI 
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Format 

E op.pr ecround rd=rc.rb 

rd=eopprecround|rcrbJ 

31 24 23 18 17 12 " 65 0 

1 E.prcc I rd I re I rb 1 op. round! 

8 6 6 6 6 

"ITic contents of register? ra and rb arc combined using the specified floating point 
operation 'llic result is placed in register rc. The opcrantm is rounded using the specified 
rounding option or using round to nearest if not specified. If a rounding option is specified, 
the operanon raises a tloanng point exception if a floating point invalid operation, divide by 
zero, overflow, or underflow occurs, <x when specified, if the result is inexact. If a rounding 
optioii is not specified, floating point execpnons are not raised, and arc handled according to 
the default rules of 1F.F.F. "54. 

D efinition 

def mui|size.v.i.wj| as 

enddef 

def E nsembJef k>ttr«}Po*nttop.vrcc.TOundja.rt>.rc) as 
c *- RcgReadfrc. 128) 
b «- RegReadfrb. 128) 
for i »- 0 to 128-prec by prec 
ci «- Ffprec.c^ec l J 
t* «- Flprec.tVfyec i J 
case op of 
EAOOf 

ai #- faddrfci.bi.round) 
E MUL F: 

ai 4- fmutyo-t*) 
E.MUL.C.F: 

•f (i and prec | then 

ai «- fadd|mu«{prec.e.i.b.i«pre<|. mul(prec.oprec,b.i|) 

cise 

ai fsub(mul(prec.cJ.b.l). mu<|prec.c.Kprec.b.f prec)| 

eodif 
EDIVF: 

at #- fdrv|a.t>) 

endcase 

a^ec i , 4- PackFJprec. at. round) 
endfor 

RegWntefrd. 128. a) 
coddcf 

CxqcptipQ? 

I loafing point arithmetic 
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Ensemble Inplace 

These operations take operands from three registers, perform operations on partitions of 
bits in the operands, and place the concatenated results in the third register. 



Operation codes 



E MULAD08 


Ensemble multiply signed bytes add doublets 


CMULADO. 1 6 


Ensemble multiply signed doublets add quadlets 


E.MULADO.32 


Ensemble multiply signed quadlets add octlets 


E.MULADO.64 


Ensemble multiply signed ocdets add hexlet 


E.MULADO.C8 


Ensemble multiply complex bytes add doublets 


E.MULAOO.CI6 


Ensemble multiply complex doublets add quadlets 


E.MULADD.C32 


Ensemble multiply complex quadlets add octlets 


E.MUL^DDM.8 


Ensemble multiply mixed-signed bytes add doublets 


E.MULAOO.M.I6 


Ensemble multiply mixed-signed doublets add quadlets 


E.MULADD.M.32 


Ensemble multiply mixed-signed quadlets add octlets 


EMULADD.M.64 


Ensemble muftiply mixed-signed octlets add hexlet 


E.MULAOO.U.8 


Ensemble multiply unsigned oytes add doublets 


E.MULADD.U. 1 6 


Ensemble multiply unsigned doublets add quadlets 


EMULADOU.32 


Ensemble multiply unsigned quadlets add octlets 


E.MULAOO.U.64 


Ensemble multiply unsigned octlets add hexlet 


E.MULSUB.8 


Ensemble multiply signed bytes subtract doublets 


EMULSUB 16 


Ensemble muftiply signed doublets subtract quadlets 


E.MULSUB.32 


Ensemble multiply signed quadlets subtract octlets 


E.MULSUB.64 


Ensemble multiply signed octlets subtract hexlet 


E MULSUBC 8 


Ensemble multiply complex oytes suotract a ou Diets 


E.MULSUB.C16 


Ensemble muftiply complex doublets subtract quadlets 


E.MULSUB.C.32 


Ensemble muftiply complex quadlets subtract octlets 


E MULSUB.M.8 


Ensemble muftiply mixed-signed bytes subtract doublets 


E MULSUB M 16 


Ensemble multiply mixed-signed doublets subtract quadlets 


E.MULSUB.M.32 


Ensemble multiply mixed-signed quadlets subtract octlets 


EMULSUB.M.64 


Ensemble multiply mixed-signed octfets subtract hexfet 


E.MULSUB.U8 


Ensemble multiply unsigned bytes subtract doublets 


EMULSUB U.I 6 


Ensemble muftiply unsigned doublets subtract quadlets 


E.MULSUB.U.32 


Ensemble multiply unsigned quadlets subtract octlets 


EMULSUB.U.64 


Ensemble multiply unsigned octlets subtract hexlet 



Section 



class 


op 


type 


prec 


multiply 


E.MUL/VDD 
EMULSUB 


NON» M U 


8 16 32 64 


complex multiply 


c 


8 16 32 
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Format 

E.op.size refcrerb 
rd=gopsize(rd,rc.rb) 

31 24 23 »8 17 12 M 65 0 

I EMze I rd \ rc 1 rb 1 op I 

8 6 6 6 6 

The contents of trusters rd, rc and rb are fetched. ITic specified operation is performed on 
these operands. T^c result is placed into register rd. 

Re^ster rd is both a source and destine ut this instruction. 

Definitio n 

def mulfsi*eAvs.v.i.ws.wj) as 
enddef 

def Ensembttnplxefop.stfe.rd.rc.rb| as 
if size=l then 

raise Reservedlnstrucbon 

endif 

d RegReadfrd. 128) 
c 4- RegReadfrc. 128) 
b <-> RegRead(rb. 128) 
case op of 

emjLADD. E MULSUB. E.MUL.ADOC. E.MULSUBC: 
cs 4- 1 

bs 4- 1 
E.hAULADDM. E MULSUBM: 

cs 0 

bs 1 
E MULADOU. E.MULSUBU 

cs 4- 0 

bs 4- 0 

endcase 
h 4- 2*si*e 

for • 4- 0 to 64 -sue by size 

di 4- d2*f,*we|.| 7*< 
case op of 

t MULADD. E.MUL-ADDU. E.MUL-ADDM: 

p 4- muf|si*e.h.cs,c.i.bs.b,i) «v di 
E.MUL^DDC: 7 
if i & size = 0 then 

p «- mui|si7e.h.cs.c.i.bs.b.i) - mtH{size.h,cj.c>S!2e.bs.b>si2e| ♦ di 

else 

p 4- mul|sizeAcs.c.i.bs.b.Ksi*e) ♦ muHstfeAc$,c.ibs.b>sae) ♦ di 

endif 

E MULSUB. E.MULSUB U, E.MULSUB.M: 
p 4- mul(si*eAcs.c.i.bs.b.i) - di 



- 262 - 



Micn>t nin 



Zxus Svstcm Architecture Tuc. Aur 17, 1999 Instruction Set 

llntrmble Inpbce 

E.MULSUBC: 

if i & sue • 0 then 

p «- mui|«e.h.eic.i.bj.tM| • mui{5ae.h.cic>$ae.bib>j«ej - <* 



endcase 

RegWntefrd. 128. a) 
enddef 

none 



/ 
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Ensemble Inolace Floating-point 

iVse iiperatMim ukc operandi fn>m three registers, perform operations on partitions of 
hit* in the operands, and place the concatenated results in the third register. 



Operation codes 



E.MULADD.C.r. 16 


pntpmhJp rmJtiotv add comotex floatino- point half I 


r ft Mi it *r\Ar C 51 


Pn^mhif» muftiofv add comolex floatinQ-point single 


E.MULADD.C.r.6* 


PntpmNp mixtion/ add Comdex floating- point double 


E.MUL-ADD.r. 16 


pntpmhii 1 mLrfticiv add floatina-ooint half 


r in ii a nn C f A ^ 


Pn^pmbip muftinlv add floatina-ooint half ceiling 


E.MUL-AUU.r. I 6r 


Pntpmbtc mufti ofv add floating-point half floor 


tr & ji ii inn c i/l m 

C MUL-ALHJ.r. 1 O.iM 


FnsemNe muftiDlv add floating-point half nearest 


it in ii Ann c y 


Fnsemble mufbplv add floating-point half exact 


c H4i ii inn c ia 7 


Ensemble multiply add floating-point half zero 


c niii ii Ann c 37 


Ensemble multiply add floating-point single f 


c a 41 it inn r 37 r 


Ensemble multiply add floating-point single ceiling 


c mi ii Ann c 17 c 


Ensemble multiply add floating-point single floor 


c mi ii Ann c 37 m 


Ensemble multiply add floating-point Single nearest 


c iuii ii Ann c 37 y 


Ensemble multiply add floating-point single exact 


C Ml ii Ann C 37 7 


Ensemble multiply add floating-point single zero 


c ilai ii Ann c Ad 


Ensemble multiply add floating-point double 


c mil ii Ann c Ad r 


Ensemble multiply add floating-point double ceiling 


c km it Ann r Ad p 


Ensemble multiply add floating-point double floor 


c mi ii Ann c Ad kj 


Ensemble muftip*y add floating-point double nearest 


c mil ii Ann r Ad y 


Ensemble multiply add fkv.ung-pomt double exact 


r mi ii Ann p Ad 7 

t.MUL/UA/.r.OH.L 


Ensemble multiply adc? floating-point double zero 


c mi ii Ann p t 7fl 


Ensemble mu'ripfv odd floating-point quad 


E.MUL.ADD.F. 1 28 C 


Ensemble multiply add floating-point quad ceiling 


E.MULADD.F. 1 28.F 


Ensemble multiply add floating-point quad floor j 


E.MULADD.F 128.N 


Ensemble multiply add floating-point quad nearest 


EMUL.ADD.F.128.X 


Ensemble multiply add floating-point quad exact 


E.MULADD.F. 1 28 Z 


Ensemble multiply add floating-point quad zero 


E.MLi.SUBCF.16 


tnsemble multiply subtract complex floating-point half 


E.MULSUB.C.F.32 


Ensemble multiply subtract complex floatirig-pc*nt sngle 


E MULSUB.C.F 64 


Ensemble multiply subtract complex floating-point double 


E.MULSUB.F 16 


Ensemble multiply subtract floating-point naif 


E MULSUB.F.32 


Ensemble multiply subtract floating-point single 


E.MULSUB.F.64 


Ensemble multiply subtract floating-point double 


E.MULSUB.F. 1 28 


Ensemble multiply subtract floating-point quad 
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class 


OD 






round/trap 


multiply add 


E.MULADO 


F 


16 32 64 128 


NONE C F N X Z 


CF 


16 32 64 


noTZ 


multiply subtract 


E.MULSUB 


F 


16 32 64 128 


NONE 


CF 


16 32 64 


NONE 



Format 

E.op.si2e rd®rc.rt> 

rd=eopsizefrd,rcrb) 

31 24 23 18 17 12 11 6 5 0 

I fljze | rd I rc I rb 1 op | 

8 6 6 6 h 

Description 

The contents of registers rd, rc and rb arc fetched. The specified operation is performed on 
these operands. The result is placed into register rd. 

Register rd is both a source and destination of this instruction. 

def muHsutyxw.jt » 
enddef 

oef EriscmMelnplaceftaaftr^ as 
d 4- RegReadfrd. 128) 
c 4- RegReadfrc. 128) 
b 4- RegReadfrb. 128) 
for i 4- 0 to 128-size by size 
di 4- Rprecd^prec-l i 
case op of 

E.MULADD.F: 

ai 4- faddjdi, mul{prec.c,i.b.i|| 
E.MULSODCF: 

if (i and prec) then 

at 4- faddjdi. fadd(muf(prec.c.i.b.i-prec). mul(c>prec.b.i))) 

else 

ai 4- faddfdi. fsub(muf|prec.c.i.b.i|. muifprec,c.Hprcc.b>prec)J) 

endif 

E.MULSUBF: 

ai 4- frsubfdt, rnuf|prec.c.i.b.i)| 
EMULSUB.CF: 

if (i and prec) then 

ai 4- frsubfdi, fadd(mul(prec,c # i.b.i~prec|. muf|c.i-prec.b.i))) 

efse 

at 4- frsubfdi. fsub(mu<|precc.i,b.i). mul|prec.c.i*prec.b.Kprec))) 
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enctf 

endcase 

a^wtc-u *- PSKtflprec at *oun* 
endfof 

RegWnttfrct 128. af 

« -» * 

cnoocr 
Exceptions 

none 



/ 



• 266. 



MjauLiutjr 



System Atchmcture Tue, Aur 17, 1999 Instniction Set 

I jucmble Revetted Ho.** point 

Ensemble Reversed Floating-point 

These operations tike two values from register*, perform a j$n>up of float'tig-point 
arithmetic operations on p .rations of bits in the operands, and place the concatenated 
results in a register. 



Operation rorire 



t^Uar. 16 


Ensemble subtract ikMong-pomt half 


c ci id r ur 


Ensemble subtract floating-point half ceiling 


C Cl ID CUC 


Ensemble subtract floating-point half floor 


C CI ID C 1 A fcJ 

fciuar. 1 6.N 


Ensemble subtract floating-point half nearest 


C CI ID C 1/. 7 

CJUO.F. 1 6.Z 


Ensemble subtract floating-point half zero 


E5UB.F. I6JC 


Ensemble subtract floating-point half exact 


C ci id cr 
cjUd F.3Z 


Ensemble subtract floating-point single 


C CI ID c ^ 


Ensemble subtract floating-point single ceding 


C CI ID C ^TT c 


Ensemble subtract floating-point single floor 


c ci id c 5T5 STi 


Ensemble subtract floating-point single nearest 


cjUarJZZ 


Ensemble subtract floating-point single zero 1 


C Cl ID C 3*> V 


Ensemble subtract floating-point single exact 


E.SUB.F.64 


Ensemble subtract floating-point double 


EJUB.F.64.C 


Ensemble subtract floating-point double ceiling 


E5UBJ.64.F 


ensemble subtract floating-point double floor 


E.SUB.F.64.N 


Ensemble subtract floating-point double nearest 


E3U6.F.64.Z 


Ensemble subtract floating-point double zero 


E5UB.F.64JC 


Ensemble subtract floating-point double exact 


E3UB.F.I28 


Ensemble subtract floating-pom' quad 


E5UB.F.I28.C 


Ensemble subtract floating-point quad ceiling 


E3UB.FJ28.F 


Ensemble subtract floating-point quad floor 


E3UB.F.I28.N 


Ensemble subtract floating-point quad nearest 


E.SUB.F.I28.Z 


Ensemble subtract floating-point quad zero 


E3UB.F.128.X 


Ensemble subtract floating point quad exact 



class 


op 


prec 


roun<ytrap 


set 


SET. 
E LG 
L GE 


16 32 64 128 


NONE X 


subtract 


SUB 


16 32 64 128 


no* C F N X 2 



/ 
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Ensemble Reverted Flrwk^ point 

Format 

E.oppreciound ro-rb/c 
rd»eoppreaound(rb.rc) 

3»_ 24 23 1817 U II 5 0 

I E.prec I rd I rc I r b I op round I 

8 6 6 6 6 



The contents of register* rc and rb are combined using the specif _u floating- point 
operation. The result is placed in register rd. The opcraoon is rounded using the specified 
rounding option «»r using round- to-nearest if not specified. If a rounding option is specified, 
the operation raises a floating-point exception if a floating-point invalid <iperation t divide by 
zero, overflow, or underflow occurs, or when specified, if the result is inexact. If a rounding 
option is not specified, floating point exceptions are m« raised, and arc handled according to 
the default rules of IEEE 754. 

Definition 

def EnsemoJeflevmedfioatin^ as 
c 4- RcgReM)irc. 128) 
b fttgfteadfrb. 129) 
tor • #- 0 to 1 28-prer by prec 

ci Ffryrc.c^pr^.j J 

bi «- F(prcc.tV|»ec.| J 

* «- frsubrfci.-b.. round) 

*»»prec-l i «- Packf|prec. at round) 
endfor 

RegWntrfrd. 128. a) 
enddef 

Exceptions 

I loatmj? pntnt arithmetic 



/ 
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Ensemble Ternary 

These opera t»<ns take three values from registers, pcrfiwm a gniup of calculation* «m 
paramos of bits of the operands and place the catenated results in a 6 Mirth register 

Operation corfrf 



EJMUL&8 


1 Ensemble mutapty Ga 


lots field byte 1 


EAM.G.64 


{ Ensemble mufopry Ga 


bis field ocHet ~~ J 



Format 

E MUL Q.uk ra=rdjc.rtt 

rasernu<9Stfcfrd.rcrb| 

*1 * « [817 ]2U 6 S o 

I c MUL G-slte I rd I rc | rb I ra | 



IV contents of registers rd. rc. and rb arc fetched. The specified operation is performed on 
these operands. The result is placed into register ra. 

The contents of registers rd and rc are partitioned into groups of operands of the si/c 
specified and multiplied in the manner of polynomials. The group of values is reduced 
modulo the p. »lyn«»mial specified by the contents of register rb. yielding a group of results, 
each of which is the size specified The gnnip of results is catenated and placed in register ra. 
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^ e Tf , ^! e 1 ?* U, . dp,y Gdob (E-MUIjGJ) multiplies operand fd!5 

mtS"f*&j!iZ** «•*«*«"*""*■ ,(d,5c!5 mod < 




raflMI 

Ensemble multiply Galois field bytes 



Definition 

<** c *- p o*yMu«ip<ytsure^.a) as 
p(OJ - o?*«» 

for k *- 0 to sue- 1 

p|k.l| pfkj * a,, ? fO"**-* 1 1 b I I 0*1 : 0**«* 
endtor 

c «- pfsuef 
enddel 

del c Poyftesidue(stfe.a.bj as 
P|0| - a 

tor k 4- sire- 1 to 0 by -I 

pfkOJ p(k| " p/OJa^ ? f0»»-* I I I ' | | b 1 1 0*1 : 0**«* 
endfbr 

c P/Wltae , o 
enddef 

def EnsembleTemafyfop.stfr.rd.rc.rb.raj as 
d «- RegRead|rd. 128) 
c «- RegReadfrt. 128) 
b ♦- RegReadfrb. 128) 
case op of 
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Fimw b h Tetmty 

EJUWLG: 

tor i «- 0 to toy tte 

etidfof 

endcase 

JtegU/tttfr* 12a 4 

— » ~* 
vnoocf 

Exceptions 



/ 
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Ensemble Ternary Floating-point 

These operations cake three values from registers, perform a group of Boating-point 
a ri thmetic operations on partitions of bets in the operands, and place the concatenated 
results in a register. 



Operation rate 



E.SCALADD.F.I6 


Ensemble scale add floatinq-point half 


ESCALADD.F32 


Ensemble scale add floating-point single 


E.SCALAOO.F.64 


Ensemble jcale add floating-point double 



class 


op 


prec 


scale add 


E.SCALADD.F 


16 32 64 



Format 

E.SCALADDJF.size ra=rd.rcrb 
r3=escataddfscefrd.rcjt>| 

31 24 23 18 17 12 M 6 5 0 

I op I rd I rc 1 rb I ra I 

8 6 6 6 6 

The contents of registers rd and rc arc taken to represent a group of floating-point operands. 
Operands from register rd arc multiplied with a floating-point operand taken from the least- 
significant bits of the contents of register rb and added to operands from register rc 
multiplied vrith a floating point operand taken from the next least- significant bits of the 
contents of register rb. The results are concatenated and placed in register ra. The results ate 
rounded to the nearest rr presentable floating-point value in a single floating-point operation. 
Floating point exceptions are not raised, and are handled according to the default rules of 
I FEE 754. These instructions cannot select a directed rounding mode or trap on inexact 

Definition 

def EroembfefloaongflOT^ as 
d 4- ftcgReadfrd. 128} 
c 4- ftegfeadfrc. 128) 
b 4- ftegReadfrt). 128) 
for i 4- 0 to 128-prec by prec 
di #- F(pr*c.4»p rcc .| J 

ci 4- Fjprecc^pr^.i. J 

at 4~ faddffmuddi. FJprec.b^., oJI. fmul|ci. FJprccb^iyec-l predll 
a H^rec-i j «- PackFJprec none) 
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Kmmhlt TtffMfy FkMtwg-pomt 

endfof 
cnooff 



/ 
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Ensemble Unary 

These operations take operandi from a register, perform operations on partitions of bits in 
the operand, and place the concatenated results in a second register. 



Operation codes 



E.LOG.MOST M 




E LOG .MOST. 16 


fin an^Ji /V nvKt cirvWftrant hat tinivd doublets 


E.LOG.MOST 32 




E.LOG.MOST.64 


crtSemDfe KX) Of musi wgrwnMru on icvi uvueu 


^ ■ si/\#v a ^ A 

E.LOG.MOST. 128 


crrldl NJ9C log O* rrwM >iyr ■irv4ir k mi »kjvpc\j iicaici 


E.LOG.MOST.LMJ 




E.LOG.MOST.U. 1 6 


r,,t anihiia rrw ac> cirtnirV Jinf hit imonnpfl /louhJetl 
cnS4nT109C log O* mujl yyliK^n i^i>fyfrc\J wiAitcu 


t.LOG.MOST.U.32 




E LOG MOST U 64 


Ensemble tog of most significant bit unsigned ortets 


E.LOG.MOST.U. 1 28 


Ensemble log of most significant bit unsigned hexlet 1 


E.SUM.8 


ErsembJe sum signed bytes 


E SUM 16 


Ensemble sum signed doublets 


E.SUM.3? 


Ensemble sum signed quadfets 


E.SUM.64 


Ensemble sum signed octlets 


E.SUM.U.I'* 


Ensemble sum unsigned bits 


E SUM U.8 


Ensemble sum unsigned bytes 


ESUMU 16 


Ensemble sum unsigned doublets 


E.SUMU.32 


Ensemble sum unsigned quacflets 


ESUM.U64 


EmefrMe sum unsigned octlets 



Section 



class 


op 


sue 


sum 


SUM 


8 16 32 64 


SUMU 


1 8 16 32 64 


log most 
significant bit 


LOG MOST LOG.MOST.U 


8 16 32 64 128 



Format 

E.op.sizc rd*rc 
rd=eopsizc(rc) 

31 24 23 18 17 12 11 6 5 0 

I E.ilte | rd I rc | op I E.UMARV 
8 6 6 6 6 



* I-.SUM.IM i»»«C€>d<d at KSUM.U.I2*. 
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Values tre taken (torn die contents of reenter rc. The specified operation is perfumed, and 
the result is p la ced in register rd. 

def En$efnbleUnaryfofxwe.nlrc} 
c +- JfegReadfrc. I28| 
use op of 

ElOGMOST: 

for » «- 0 to 128-we by sue 
* «0| tnen 

Wi i «- 1 

etse 

for j 0 to we- 1 

eocaf 
eneffor 

end* 
endfor 
ELOGAIOSTU 

for i 0 to 128-we by size 
(Civile- J i =Q V*" 

else 

for j 0 to we- 1 

* Cjue-li » (OW^-'-J 1 1 IJ then 
a fW-l i J 

end* 
endfor 

endif 
endfor 
E.SUM 

p<0| 0 ,2 « 

for i «- 0 to 1 28 -we by we 

p(»»itfe) - p(.| . • • Ciue-i«i J 

endfor 

a - p|128| 

E.SUMU 

pjO| «- 0 ,2S 

for i «- 0 to 128-we by we 

p(»»we| «- p(.| ♦ (0'» w« I I c w ., H ; 
endfor 
a +- pf I78| 

endcase 

RegOYtteffd. 128. aj 
enoocf 
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Ensemble Unarv Floating-point 

These operations take one value from a register, perform a group of feating-point 
anthmcoc operations on partitions of bits tn the operands, and place the concatenated 
result* in a register. 



Operation codes 



E-ABS.F. 1 6 




E-ABS.F. I6J( 


f nsembie Mj-jd mt value Aoatng-po*nt naif exception 


E-ABS.F 32 


Ensemble absolute value ftoabng-pont single 


E-ABS.F.32.X 


Ensemble absolute value N>abng-pont sngle exception 


E-A8S.F.64 


Ensemble absolute w alue floatng-pont double 


E-ABS.F.64.X 


Ensemble absoxjte value Aoaong-por* do t*e exemption 


E-ABS.F. 128 


Ensemble absolute value floaong-pont r^rnl 


E-ABS.F. 1 28 J( 


Ensemble absolute value floatng-pont ?uad exception 


E.COPV.F. 1 6 


€mrmblr copy floatng-pont harf 


E.COPV.F I6.X 


Ensemble copy ^o*ong-pont harf exception 


E.COPV.F.32 


Ensemble copy floatng-pont sngle 


E.COPV.F.32.X 


Ensemble copy floatng-pont sngle exception 


E.COPV.F.64 


Ensemblr copy floabng-pexnt double 


E.COPV.F.64.X 


Ensemble copy floamg-pont double exception 


E.COPV.F. 1 28 


Ensemble copy floafing-pont quad 


E.COPV.F. 1 28-X 


Ensemble copy floatng-pont quad exception 


E.DEFLATE F.32 


Ensemble convert floating-point half from single 


E.DEFLATE.F. 32. C 


Ensemble convert floating-port half from single cering 


E.DEFLATE.F. 32.F 


Ensemble convert floatng-pont half from single floor 


c:.L)fcrLAiE.F.32.N 


Ensemble convert floating-point half from sngle nearest 


E.DEFLATE.F.32.X 


Ensemble convert floatng-pont h^f from sngle exact 


E.DEFLATE.F 32.Z 


Ensemble convert floating-point half from sngle zero j 


E.DEFLATE.F. 64 


Ensemble convert floatng-pont sngle from double 


E.DEFLATE.F.64.C 


Ensemble convert floatng-pont sngle from double cering 


E.DEFLATE. F.64.F 


Ensemble convert floatng-pont sngle from double floor 


C.DEFLATE.F.64.N 


Ensemble convert floatng-pont sngle from double nearest 


E DEFLATE F 64 X 


Ensembie convert floatng-oont sngle from double exact 


E.DEFLATE.F 64 Z 


Ensemble convert floatng-pont sngle from double rero 


E.DEFLATE.F. 128 


Ensemble convert floatng-pont double from quad 


E.DEFLATE.F. 1 28.C 


Ensemble convert floating-point double from quad cedng 


E.DEFLATE.F. I28.F 


Ensemble convert floatng-pc-nt double from quad floor 


E.DEFLATE.F. 1 28.N 


Ensemble convert floatng-pont double from quad nearest 


EDEFLATE.F.1 28 X 


Ensemble convert floabng-pont double from quad exact 


E.DEFLATE.F. 1 28.Z 


Ensemble convert floatng-pont double from quad rero 


E.FLOAT.F.I6 


Ensemble convert floabng-pont half from doublets 


E.FLOAT.F.16.C 


Ensemble convert floabng-pont t^tt from doublets cetfng 


E.FLOAT.F.I6.F 


Ensemble convert floatng-pont ha* from doublets floor 


E.FLOAT.F.I6.N 


Ensemble convert floabng-pont half from doublets nearest 
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E.FLOAT.F.1 6 X 


Ensemble convert floating-point half from doublets 1 exact 


E FLOAT.F. 1 6.Z 


Ensemble convrrt floating-point from doublets mo J 


E.FLOATF.32 


Ensemble convrrt floatmg-pomt ungY from quartets 


E.FLOATF.32 C 


Ensemble convert floating-point vngtr from quartets ceiling 


EFLOAT.F.32F 


Ensemble convert floating-point single from c-ttrtets floor 


EFLOAT.F32.N 


Ensemble convert floating-point single from quartets nearest 


tFLOATF.32J( 


Ensemble convert floating- po>"»t single from quartets e*act 


EFIOAT.F32Z 


Ensemble convert flo.<mq- point single from quartets leco 


E.FLOAT.F.64 


Ensemble convert floating-point double from octlets 


E FLOAT.F.64.C 


Ensemble convert floating- point double from octlets ceiling 


E.FLOAT.F.64.F 


Ensemble convert floating-point double from octlets floor 


E. FLOAT F.64.N 


Ensemble convert floating-point double from octlets nearest 


E. FLOAT. F 64.X 


Ensemble convert floating- point rtoc^de from octleo exact 


EFLOATF64.Z 


Ensemble convert floating-point double from octteu J' o 


E FLOAT F. 1 28 


Ensemble convert floating- point quad from hexlet 


E.FLOAT.F. 1 28.C 


Ensemble convert floating-point quad f om hexk .e<; >g 


E.FLOAT.F. 1 28.F 


Ensemble convert floating-point quae from he ; "toot • 


E FLOAT F. 1 28.N 


Ensemble convert floating-point ni i from i-xlet nearest 


E FLOAT F.I 28 X 


Ensemble convert floating-point quad from hexlet exact 


E.FLOAT.F. 1 28.Z 


En;em&e convert floating-point quad from hexlet zero 


E.INFLATE.F. 16 


Ensemble convert floating-point single from half 


E.NFLATE.F.I6-X 


Ensemble convert floating-point single from half exception 


E.INFLATE.F.32 


Ensemble convert floating-point double from single 


E INFLATE.F.32.X 


Ensemble convert floating-point double from single exception 


E INFLATE.F.64 


f nsrmb^ convert floating-pocnf quad from double 


E JNFLATE.F.64JC 


Ensemble convert floating-point quad from douUe exception 


ENEGF 16 


Ensemble negate floating -point half 


ENEG.F 16.X 


Ensemble neg.we floating point half exception 


ENEGF32 


Ensemble negate flo.rting-point single 


E NEG F 32.X 


Ensemble negare flamng-point single exception 


ENEGF64 


Ensemble negate flowing potnf double 


ENEGF64* | 


f n,emble neg.We flowing. point double exception 


ENEG.F 128 


Ensemble negate flo.King.pomf qu.*1 


E.NEG.F. I28JC 


Ensemble neg.*e flo.iting-f.nnt quad exception 


ERECESTF 16 


Ensemble reciprocal esttm.ite flo.it)na-pf»nf h.ilf 


ERECESTF 16 X 


Ensemble re'ipror.il estim.*e fio.*ting- point h.ilf exreptinn 


E RECEST F 32 


Ensemble rer.iprot.tl estimite flo.«ting point single 


E RECEST F 32 X 


fnsenibie refiprnc.il estimite flo.iting-r>otnt single exreprion 


E RECEST F 64 1 


Ensemble reripror.il estim.itr flo.iting- point double 


E RECEST F 64 X 


("nsemhle <ni(yi'*.* e-.tinvife flo.iting-p«»r>t double enepCinn 


ERECESTF 128 


Ensefnble ferine.* estirrvife flrxeinq .point qtfcH 


ERECESTF 128X 


Ensemble fecipior.il esfinuite ftrhiting-poinf q.*«l exiepfion 


£ RSORESTF. 16 


f"SemUe flrkiting point rrtifvor.il sqi*ve roof estimate ri.Wt 


E RSORESTF 16 X 


J»isemb;» tkxting (mnt rert|*n<.« 'j> t «r inof e\tim.«te n.nf ex.vt 


E RSORESTF 1? 


I "VnWr n» k ifin^ prm.1 nt.fw'fM 7>ji**e ion! «".fir»j.ife single 


E RSOREST F 32 X 


1 nimble fl'»*ing-pr»r>t ce»ipro».»l sqi*wr ioo» estimate single e«.*r 
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E.RSQREST.F.64 


Ensemble taatmo-point reiiproc* square root estimate double 


E.RSQREST.F.64.X 


Ensemble hommg-pomt reciprocal square root estimate double exact 


E.RSQREST.F.I28 


Entcmbte Aoxng-por* reciprocal square root estimate quad 


E.RSQRES7.F. 1 28 X 


Ememble floating -point recprocaf square root estimate quad exact 


ESINK.F.I6 


Ensemble convert floating-por* doublets from half nearest default 


E.SINK.F.I6.C 


Ensemble convert floating-point doublets from half ceding 


E.SINK.F. 1 6.C.D 


Ensemble convert floating-pome doublets from half ceding default 


E.SINK.F. I6.F 


tn^rrhbic convert floating-point doublets from half floor 


E.SINK.F I6.F D 


Ensemble convert floating-point doublets from fWf 1 hoar default 


E.SINK.F.I6N 


Ensemble convert floating-point doublets from half nearest 


E SINK F.I 6.X 


Ensrmb4r convert floating pomt doublets from half e**Kt 


ESINK.r.l6.Z 


Ensemble convert flonting -point doublets from half zero 


E.SINKF.I6.Z.D 


Ensemble convert floating-point doublets from half zero default 


E.SINK.F. 32 


Ensemble convert floating-point quartets from single nraretf default 


E.SINK.F. 32.C 


Ensemble convert floating -pomt quartets from single ceding 


E.SINK.F. 32 CD 


in%*m&e convert floating -pomt quadlets from single ceiimg default 


E.SINK.F.32.F 


Ensemble convert floating-point quadlets from single floor 


E.SINK.F 32.F.D 


Ensemble convert floating-point quadlets from jingle floor default 


ESINK.F.32.N 


Ensemble convert floating-point quadlets from single nearest 


ESINKF.32JC 


Ensemble convert floating-point quartets from single enact 


ESINKF32Z 


Ensemble convert floattng-pomt quadlets from single zero 


E.SINK F 32.Z.D 


FnyemNe convert floating-point quartets from single zero rtrfrturt 


ESINKF64 


Ensemble convert floating -pomt octvts from double nearest defa^t 


ESINKF64C 


Ensemble convert floating-point octlets from double ceding 


ESINKF64CD 


Ensemble converl floating-point octJets from double ccrtmc default 


ESINKF64F 


£n\cm&r cc^vert floating-point octlets from double floor 


E.SINK F.64 F 0 


£ ^sembJe convert floating-point octlets from double floor default 


E SINK.F 64 N 


£n\em&e convert floating-point octlets from double nearest 


E SINK F 64.X 


fnsemDIe convert floarmg. point octlets from double enact 


E.SINK.F 64.Z 


Ensemble convert floating-point ocUets from double zero 


E SINK F 64 Z D 


Fnvembie convert floating-pom! octlets from double zero default 


E SINK F 128 


£nvemb*e convert floating-point neoet from quad nearest default 


E SINK F.I 28 C 


Inservtotr convert floating. nomt he*Jet from quad ceiling 


E SINK F 128 C D 


E^vemoie converf floating-point nrnlet from quad ceiling de*auft 


E SINK F 128 F 


£ovrmb<e convert floating point nemlet from c***d floor 


E SINK F.I 28.F.D 


f^vemble convert floatmg-pomt he* let from quao floor default 


E SINK F 128 N 


Entrmbie convert flo-wrng pomt nealet from ouad nearest 


F SINK F 128 X 


tnsrtrj&r convert floating-point fx- * let from quad e>u«ct 


E.SINK F 128 Z 


Ensemble ronvert floaf'ng.pomt healet from quad zero 


E.SINK F 128 Z D 


Fnvem* .onve»t floating-point heitet from quad zero default 


ESORF 16 


£nv *tv - x*are root floatrno-pomt h«*tf 


ESORF 16 C 


Fnvr ~ »qu.ve root floating -pomt half ceding 


ESORF 16 F 


EnvrmNe tqua/e root floating-point half floor 


ESORF 16 N 


f nvemWe square root floatmg-pomt half nearest 


ESORF 16 X 


(nsembie square root ftoatmg-pomt half eaact 


ESORF 16 Z 


Envfmbk- tquare root flo.4mg.pomt h«df zero 
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■ T ^ 1 — y^^W JOT ^w*^T 


1 ESQR F 32 C 




1 ESQR F 32 F 


1 •- - ' ▼ " '^MHy^pw jW^p^ fKJUl 


1 E.SQR F 32 N 


i •» ■ w» v^^mot^^^m^hk tm^^^c t rani f 


1 E.SQR F 32 J< 


I ™ ' ~ * * * ■ Y^IV™ W SMVUPC^ CX0(l j 




■ — • - - t™ * <^"w.y|wvn ynyr fTTO s 


ESQR F 64 




1 ESQR F 64 C ' 


I *^T*""~ * »w* ^ -i^wh ugurac tCwiu | 


1 E.SQR F 64 F " 


■ - v * • uuuur nouv 1 


1 E SQR F 64 N 


1 • ^f 1 "" *• 1 ^f~y*jm ft OBUHC nCor cm 1 


1 E.SQR F 64 X 


i mwiwn ^f^c 'Wi ■•onur^'povK CXXJOfC 1 


1 E SQR F 64 7 


i •** , »«»^- Jif*wc tuui •y-po«rn ciouOic zero | 


1 1 .j\ii\.r. i ^ o 




1 F SQR F 1 7ft C 


i trixTr^, xvfww ivm iivMong-poinc quad cciimg 1 


1 E SQR F 1 28 F 


i b ' ■ 11 ■* r*** » 'wi 'MWyymm CfUou rlOOf a 


1 E SQR F 128 IV 


1 " " 'wi ifuaur^-pgni C|UJO nc^TWl | 


1 E SQR F 1 28 X 


i •* v tuui (■omir^'fJOITTC CfLicfQ CXoCi | 


1 E^QR F 1 28 2 


1 • • "f^c < iHMJf *U*^JOV K QioO zero $ 


lESUMF 16 




lESUMF 16 C — 


1 FnscTnbJ^ SLm fVvtttrm-nrttnf ?35 r*Z73i 1 
■ *> **" * • "wtw»y fwin i tali icilnq I 


lESUMF 16 F 


1 *■ ■ "WDUit«f*fMrl| 1 Ion Mum 1 


IE.SUMF 16 N 


I *- « » m«>« i tvMK/i ^ K^"' »l rwHT fiCdiOl I 


lESUtAF 16X 


i * »^^« ■ * T >vnu< >u punii v laii fXJKaCl ft 


IESUKLF 162 


1 m " 1 * » ■ IIMHI? U ft/VII H fflOil M^^%J I 


1 ESUM F 32 


■ » » ■» i *v » • * ^^iw' VfAW «\ jiiiur I 


1 E SUM F 32 C 


1 w ' ■ • * 'VPwy^wn Wi *y " vCmITIQ I 


1 E3UM F 32 F 


I *• *w ii >iunj' H^'^^^i »i MTvQIF nOuf I 


1 E SUM F 32 N 


■ w » m^vx » » » rvAr^ji F\y ^**YT r\ VfnjlC 1 PTtWCM 1 


1 E SUM F 32.X 


I fc > n^r»vii *f-/v* r f\ >#t *UJ^ CJloC I 1 


1 E SUM F 32 7 


■ fc» ■ ^ • ■ >»#>^ "»UOUI <^ H >JrlUJP jffO 1 


1 E SUM F 64 


i fc * " • * 'kxiwt •^■fMjn »i ouuutc I 


1 E SUM F 64 C 


i *■ ^ ■ ' ^- J> *' mwip^^jwr^ uquupt crwrvu | 


1 E SUM F 64 F 




IE5UMF64 M 




E SUM F 64 X 


i **■ * * • ' f-*** tjuuupi rue! | 


E SUM F 64 7 




E SUM F 1 28 


[ **■ ■ ■ 1 r i^^r^jr TJ'inrr •! uypij p| 


E.SUM.F.I28C ~~~ "~~ 


| Enj^mblr u^n rVwOng-pomt qurirt ceding j 


E SUM F I28 F 


f nirmWr floating -pmnt qurVl floor | 


ESUM F.I 28 N 


Irnem&e u>n no-mng-potnt qu^d nrarrU | 


E.SUM.F. I28JC ~~" 




|ESUrV!F.I23.Z 


£n- J rmb*r icm flo^Ong-poinf qcmtJ /rro ( 



Micn»l : nin- 



Zeus System Architecture 



Tuc, Aug 17, 1999 



Instruction Set 
l-ntemfalr Unary MoatuiK pour 



copy 

aosofute 

value 


op 

" copy 

ASS 


l*ec 

16 32 64 128 
16 32 64 128 


NONE X 
NONE X 


float from 
integer 


FLOAT 


16 32 64 128 


NONE C F N X Z 


integer 
from float 


SINK 


16 32 64 128 


none C F N X 2 
CD F.D Z D 


increase 

format 

precision 


INFLATE 


16 32 64 


NONE X 


decrease 

format 

precision 


DEFLATE 


32 64 128 


NONE C F N X Z 


negate 


NEG 


16 32 64 17ft 


NONE A 


reciprocal 
estimate 


RECEST 


16 32 64 128 


NONE X 


reciprocal 
square root 
estimate 
square root ' 


RSQREST 

i 
i 

<;np I 


16 32 64 128 


NONE X 


sum 


1 ' *J J* Ot 1 

SUM ; 16 32 64 128 


"or* CFNXZ 
"or* CFNXZ 



Format 

E.op.prcc.round rd=rc 



rd=eopprecround|rc) 

3J 24 23 



E.prec 

8 



I 



IS 17 



rd 



" I? M 6 5 0 

I rc I op | E.UNARY I 



Description 

TV contents of register rc is used as the operand of the specified floatingpoint operation 
Ihc result is placed in register rd. 

The opcranon is rounded using the specified rounding option or using round-to-nearest if 
not spccihcd. It a n>unding option is specified, unless default exception handling is specified 
the operation raises a floating point exception if a floating-point invalid operation, divide bs 
zero, overflow, or underflow occurs, or when specified, if the result is inexact. If a mundin* 
opnon „ not specified or .f default exception handling is specified, floaring-point excepts 
are not raised, and are handled according to the default rules of I ERE 754. 
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The tccipnical estimate and rccipn>cal square nnn estimate insrnictionr ccimpurc an exact 
rcsuh (in half precision, and a result with at least 12 bits of significant precision for larver 
formats. * 

Definition 

def EnsembteUnaryF toati^l\»rntor>.prcc.roundjd.Ttl as 
c «- ftegfteadfrc. 128) 
case op of 

EABS.F. E.NEG.F. E.SOR.F: 

for i «- 0 to 128-prec by prec 

ci Ffprec.c^prec.1 d 
case op of 

EABS.F: 

ait 4- ci.t 

ai.s 0 

ai.e 4- ci.e 

ai.f 4- ci.f 
E.COPV.F. 

at 4- ci 
E.NEG.F: 

ait 4- ci.t 

ai.s 4- -ci.s 

ai.e 4- ci.e 

at.f «- o f 
ERECEST.F: 

ai 4- frecestfci) 
E.RSOREST.F: 

at 4- frsqrestfci) 
E.SQRF: 

at 4- fsqrfo) 

endcase 

a*prec-l i «- PackFfprec, at. round) 
endfor 

E.SUM.F 

p(0| t 4- NULL 

for i 4- 0 to 128-prec by prec 

pffprecj #- fadd|p/ij. Flprec.c^prec i ill 
endfor 

a *- PackFfprec. p|l28J. round) 
E.Sf NK.F: 

for t 4- 0 to 128-prec by prec 

ci 4- Ffprec.c,*prec-f J 

^proc-l i «- fsmkrfprec. ci. round) 
endfor 
EFLOATF. 

for i 4- 0 to 128-prec by prec 
Ci.r 4- NORM 

cie 4- 0 / 

O f 4- CIS ? l«-C l# p* 3C . 7j c Kpf0C . ? , 

*^pree-l i «- PackFfprec. ci. round) 
endfor 
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Ramble Umiy I 

CJNFLATEJ: 

for i *- 0 h> 64 -prec by prec 

***prwcinc-tJ* «- P*kFfprec*prec. a round} 
endfor 
E0EFLA7EJ: 

tor • <- 0 to 128-prec by prec 
Ci 4- F|prec f |»pre c -I.i 

^♦prec/2-l V2 «- Packf(pr«/2. ci. rourvfl 
endfor 

*I27 64 «- 0 

endcase 

RegWnteJrd. 128. a| 
enddef 

Mo*tmg potnt arithmetic 
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Wide MulttD»V Matrix 

These instructions take an address from a general register to fetch a large operand from 
memory, a second operand from a general register, perform a group of operations on 
partitions of bits in the operands, and catenate the results together, placing the result in a 
general register . 



Operation codes 



W.MULMAT.8.B 


Wide multiply matrix signed byte big-endian 


W.MULMAT.8.L 


Wide multiply matrix signed byte little-endian 


W.MULMAT.16.B 


Wide multiply matrix signed doublet big-endian 


W.MULMAT.16.L 


Wide multiply matrix signed doublet itWe-endian 


W.MULMAT.32.B 


Wide multiply matrix signed quadlet big-endian 


W.MULMAT.32 L 


Wide multiply matrix signed quadlet little-endian J 


W.MULMAT.C8.B 


Wide multiply matrix signed complex byte big-endian 


W.MULMAT.C.8.L 


Wide multiply matrix signed complex byte little-endian 


W.MULMM.C. 1 6.B 


Wide multiply matrix signed complex doublet big-endian 


W.MULMAT.CI6.L 


Wide multiply matrix signed complex doublet little-endian 


W.MULMAT.M.8.B 


Wide multiply matrix mixed-signed byte big-endian 


W.MULMAT.M.8.L 


Wide multiply matrix mixed-signed byte little-endian 


W.MULMAT.M.J6.B 


Wide multiply matrix mixed-signed doublet big-endian 


W.MULMAT.M.16.L 


Wide multiply matrix mixed-signed doublet little-endian 


W.MULMAT.M.32.B 


Wide multiply matrix mixed-signed quadlet big-endian 


W.MULMAT.M.32.L 


Wide multiply matrix mixed-signed quadlet little-endian 


W.MULMAT.P.8.B 


Wide multiply matrix polynomial byte big-endian 


W.MULMATP.8.L 


Wide multiply matrix polynomial byte little-endian 


WJWULMAT.P.I6.B 


Wide multiply matrix polynomial doublet big-endian j 


W.MULMAT.P. 1 6.L 


Wide multiply matrix polynomial doublet little-endian 


W.MULMAT.P.32.B 


Wide multiply matrix polynomial quadlet big-endian 


W.MULMAT.P.32.L 


Wide multiply matrix polynomial quadlet little-endian 


W.MULMAT.U.8.B 


Wide multiply matrix unsigned byte big-endian 


W.MULMAT.U.8.L 


Wide multiply matrix unsigned byte little-endian 


W.MULMAT.U.I6.B 


Wide multiply matrix unsigned doublet big-endian 


W.MULMAT.U.16.L 


Wide multiply matrix unsigned doublet little-endian j 


W.MULMAT.U.32.B 


Wide multiply matrix unsigned quadlet big-endian j 


W.MULMAT.U.32.L 


Wide multiply matrix unsigned quadlet little-endian 



Selection 



class 


cp 


type 


size 


order 


multiply 


W.MULMAT. 


NONf UUP 


8 16 32 


B L 


c 


8 16 


B L 
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Format 

Wopsizeorder rd^rcrb 

rd=rwop$izeordef(rcrb) 

31 2423 1817 1211 

|W.MINO».ord«r| rd | rc I rb 

8 6 6 6 

Description 

The contents of register rc is used as a virtual address, and a value of specified size is loaded 
from memory. A second value is the contents of register rb. The values arc partitioned into 
groups of operands of the size specified. The second values arc multiplied with the first 
values, then summed, producing a group of result values. The group of result values is 
catenated and placed in register rd. 

The memory multiply instructions (W.MULMAT, VC'.MULMAT.C, Vt'.MULMAT.M, 
\X Ml'LMAT.P, VC'.MULMAT.U) perform a partitioned array multiply of up to 8192 bits, 
that is 64x128 bits. The width of the array can be limited to 64, 32, or 16 bits, but not smaller 
than twice the group size, by adding fine-half the desired size in bytes to the virtual address 
operand: 4, 2, or I. The array can be limited vertically to 128, 64, 32, or 16 bits, but not 
smaller than twice the group size, by adding one-half the desired memory operand size in 
bytes to the virtual address operand. 

The virtual address must cither be aligned to 1024/gsizc bytes (or 512/gsizc for 
W'.Ml'LMAT.Q (wirh gsize measured in bits), or must be the sum of an aligned address and 
one-half of the size of the memory operand in bytes and/or one-quarter of the size of the 
result in bytes. An aligned address must be an exact multiple of the size expressed in byres. If 
the address is not valid an "access disallowed by virtual address" exception occurs. 

A wide mulnply oedcts instruction (Vr.Ml'LMAT.typc.64, typc=NONF M V P) is not 
implemented and causes a reserved instruction exception, as an cnscmblc-multiply-sum- 
octlets instrucnon (EMI LSl : M.typc.64) performs the same opcraaon except that the 
mulnplier is sourced from a 128-bit register rather than memory. Similarly, instead of wide- 
multiply complex quadlcts instmcnon (VC .Ml LMAT.C.32), one should use an ensemble- 
multiply complex cjuadlcts instruction (E.MU_SrM.C32). 

A wide mulnply doublets instruction (W.MULMAT, VC .MILMAT.M, \V.Ml LNIAT.P, 
\Y.MILMATI : ) multiplies memon* |m31 mM) ... ml mOJ with vector fh g f e d c b a), 
yielding products |hm31+gin27+...+bm7+am3 ... hm28+gm24+...+bm4+am0]: 
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TO* Multiply Mum 



rj« ;>* • 



ft? 



ta • ■;-.« 

% ' 'a . v*£ -*r* t-J i 



mmmm 




rd|12«) * 

Wide multiply matrix 

A ui<k-mulriph-matnx-cc>mplcx-doublcts instruction (W.MULMAT.Q multiplies memory 
|m!5 ml4 ... ml mOJ u-ith vector [h g f c d c b sj, yielding products 
Ihm14+gml5+...+bm2+am3 ... hml2+gml3+...+bm0+aml -hml3+gml2+...- 
bml+amO): 



*(rt|(*4**4/stx«) 



>3D 



$ . .9 



in 




rd|1M| • 

Wide multiply matrix complex 
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DefioffiQD 

def mLrifsreAvs.vj.wvwJ as 

mut «- JrVstwn)*-** 1 1 v^i^i • ffws&w^,^-^ 1 1 w^,^ 



del c *- PtyMutbpfyf stte.a£| as 

p(0| 4- 0***» 

for k «- 0 to are- 1 

p(kH| «- p(k| * ak ? (OP** 1 1 b I I 0*1 : 0**»» 

endfor 

c 4- p|atej 
enddef 

def MemoryMuftipryfrna/x 
d 4- RegReadfrd. 128) 
c 4- RegReadfrc. 64) 
b RegReadfrb. 128} 
*gwc #- logfgwe) 

* c igw*-4 o * 0 then 

raise AccessDisaaowedByWtualAjdra 

endif 

* c 2 igwe-J * 0 

wwc #- |c and f0<|| 1 1 0 4 
I 4- c and (c-1j 

erse 

wwe «- 64 
t a 

end* 

Iwuze 4- logfwwej 

* tfw*iM»64gsue fvvw-3 * 0 then 

rnsae 4- ft and fO-t)| 1 1 0 4 
VniAddr «- t and ft 1} 

else 

WtAddr 4- t 

end* 

case major of 
WMJNOR.8: 

order 4- B 
W.MUVOR.L 

order 4- L 

endcase 
case op of 

W.MULMAT.U.8. W.MULMAT.U 1 6. W.MULMAT.U.32. W.MULMAT.U.64: 
ms 4- bs 4- 0 

W MUl.MAT.M3. W.MULMAT.M. • V W.MULMAT.M32. W.MULMAT.M.64: 
rra «~ 0 
bs 4- I 

WMULMATB. WMUL MAT 16. WMULMAT.32. W.MULMAT.64. 
W.MUL.MAT CA W.MULMAT.C. 1 6, W.MULMAT.C32. W.MOL.MAT.C.64: 
m$ 4- b$ 4- I 

W.MUL.MAT.PA WMULMAT.P.I6. W.MULMATA32. W.MULMAT.P.64: 
endcase 
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m «- LoadMemoryfcWtA^ t msife.o^ 
h «- 2*qsue 

for » «- 0 to wsu^gsue by gstzt 
q(0| «- 0**9«* 

for j 0 to vaw^we by gsiic 
case op of 

W.MULMATPA WAULMAT.P.I6. WAIUUUAT.R32, WAIULMAT^4: 

k «- Kwsizetiaigaje 

qjrgstfej qfl * ^Mutoplflgs«^ 

W.MULMAT.CA W.MULMAT.CI6. W.MULMAT.C32. W.MULMAT.C.64: 
* H & j & gsize ■ 0 then 

k 4- i-0&9^^ws^j8ig$i2^l 

q(rgsize| q(jl ♦ mulfgstteArnirnXt*bj) 

else 

k «- Hgstte*wwe*j8jgs^i 

<Hrgstte| «- qj/l - mul(gsueAmimXbibJ 

eodtf 

W.MULMATJ. W.MULMAT. 1 6, W.MULMAT.32. W.MULMAT.64 
W.MULMAT.M.8. WJAULMAT.M. ! 6. W.MULMATM32. WAIUL.MAT.M 64 
W.MULMAT.UA W.MULMAT.U. 1 6. WJWLMAT.U.32, W.MULMAT.U.64: ' 
qfrgsue) q(jj ♦ niu^sizeAmifaKvvstteV8Jg»»^bj) 

erxJfbr 

*2*gsi*e-l*2'i.2*i «- q/vsi*e| 
endfor 

*I27 7Nvwe 0 
RegWTitefrd. 128. a) 
onddef 

Exceptions 

Accra disallowed bv virtual address 
Accra* dnaOowed by tag 
Access dtsafiowed by global ID 
Accra* disallowed by local 1"B 
Access detail required by tag 
Access detail reotared by local TB 
Access detail irquued by global TB 
Ixxal TB mis* 
Global IB mm 



/ 
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Wide Multiply Mat x Extract 

These instructions rake an address from a general register to fetch a large operand from 
memory, a second operand from a general register, perform a group of operations on 
partitions of bits in the operands, arJ catenate the results together, placing the result in a 
general register. 



Operation codes 



W.MULMATXB 


Wide muftipry matrix extract bkj-endian s 


1 W.MULMATXL 


Wide mutopty matrix extract little-endian J 



Format 



op ra=rc,rd,rb 
ra=op(rc,rd,rto) 

*! 2423 1817 I2M 65 0 

1 op I rd I rc I rb I ra "1 

8 6^ 6 6 6 



Pexription 

The contents of register rc is used as a virtual address, and a value of specified size is loaded 
from memory. A second value is the contents of register rd. The group size and other 
parameters are specified from the contents of register rb. The vaJucs arc partitioned into 
groups of operands of the size specified and are multiplied and summed, producing a group 
of values. The group of values is rounded, and limited as specified, yielding a group of 
results which is the size specified. The group of results is catenated and placed in register ra. 

NOTE: The size of this operation is determined from the contents of register rb The 
multiplier usage is constant, but the memory operand size is inversely related to the 
group size. Presumably this can be checked for cache validity. 

We also use low order bits ofrc to designate a size, which must be consistent with 
the group size. Because the memory operand is cached, the size can also be cached 
thus eliminating the time required to decode the size, whether from rb or torn rc. 

The wide multiply matrix extract mstrucuons (VC ML'LMAT.X.B, VC AIUI-.MAT.X.I-) 
perform a partitioned array muluply of up to 16384 bits, that is 128x128 bits. The width of 
the array can be limited to 128. 64. 32, or 16 bits, but not smaller than twice the group size, 
by adding one-half the desired size in bytes to the virtual address operand: 8, 4, 2, or 1. The 
array can be limited vertically to 128, 64, 32. or 16 bits, but not smaller than twice the group 
size, by adding one-half the desired memory operand size in bytes to the virtual address 
operand. 
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Bits 31..0 of the contents of register rb specifics severs] parameters which control the 
manner in which data is extracted The position and default values of the control fields 
allows for the source position to be added to a fixed control value for dynamic computation, 
and allows for the lower 16 bits of the control field to be set tor some of the simpler extract 
cases by a single GCOPY1 instruction. 

3] 2423 161514 mat 1109 e 0 

I fsjgg I dpos |x|s|n|rri llrndf 

8 8 I I I I I 2 



QSSP 



1 



The table behmr describes the meaning of each label: 



label 


bits 


meaning 


fsize 


8 


field size 


dpos 


8 


destination position 


X 


1 


reserved 


s 


1 


signed vs. unboned 


n 


1 


complex vs. real multiplication 


m 


1 


mixed-sign vs. same-sign 
multiplication 


1 


1 


saturation vs. truncation 


md 


2 


rounding 


gssp 


9 


group size and source position 



The 9-bit gasp field encodes both the group size, gaixe. and source position, spos, 
according to the formula gssp = 512 4*gsixe+ spos The group size, gsixe. is a power of 
two in the range 1 ..128. The source position, spos, is in the range 0..(2*gaixe)- 1 . 

The values in the s, n, m, t, and md fields have the following meaning: 



values 


i $ 


n 


m 


1 


rnd 


0 


unsigned 


real 


same-sign 


truncate 


F S 


1 


signed 


complex 


mixed-sign 


saturate 


2 


2 
3 








N 
C 



The virtual address must be aligned, that is, it must be an exact multiple of the operand size 
expressed in bytts. If the address is not aligned an "access disallowed by virtual address" 
exception occurs. 

/. (zero) rounding is not defined for unsigned extract operations, and a Rcscrvcdlmtniction 
except**! is raised if attempted. F (fW) rounding will properly round unsigned results 
downward. 

/ 

A wide multiply matrix extract-doublets instruction (W.MULMAT.X.B or 
U •.Miri.MAT.X.lo mulriplies memory |m63 m62 mol ... m2 ml mOJ with vector |h g f e d 
c b aj, yielding the pnxJucts |am7+bml5+cm23+dm31+cm39+fm47+ Km 55+hm61 . 
am2- bm 1 0*em 1 8-»-dm26*em34-»- fm42+gm50+ hm58 
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ami ^bm9+crn17+dm2S+cni33-»-6n41>gm49^hm57 
tm0^bfn8*cml6+dm24^ctn32+fm4O+gm48+hfli56J, founded and limited 



as specified: 



(fCMttS't 



Itl? 



Willis 

3*. 1". m !P 



• c 35 a* v;; U 
■; ? *v $a r* 

r.-- « la « 

>•• isf Sisli 



Kill Ml 



taa raflM) • 

Wide multiply extract matrix doublets 



/ 
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A wkiMnuhiply-iMtrix^ttaa instruction (W.MUI-MATJC with n set in 

tfa) multiplies memory |m31 m30 m29 ... m2 ml m0| with vector (h g f c d c b t], yielding 
the products tsm7+bm6+onl 5+dm ! 4+cm23+ 6n22+pn31 +hm30 ... am2bm3+cml(V 
dmll-^eml»-finmKm26.hin27 tmHbmO+cm9+dm8^eml7^fail6+|^+hm24 tmO- 
bml+cm8Hhn9+eml^n7^pn24-hm25J % rounded and limited as specified: 



•tt 



rcJ44*1tt/ttt«t 



T -.-:; irt^ 



1 1ST 



& MM ©Si 



mint) 



Wide multiply extract matrix complex doublets 



Definition 

def mUfsurtn.vj.v.cws.w.j) as 
enooer 

def WkJeMuttpfyExtractMatrij^ 
d 4- RegReadfrd, I28| 
c 4- RegReadfrc 64) 
b RegReadfrh. 128) 
case t>a o 
0.255: 

sgwe 4- 128 
256.383: 

sgwe «- 64 
384.447: 

sgsue 32 
448.479: 

sgwe 16 
480.495: 

sgsize 4- 8 
496.503: 

sgwe 4- 4 
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504.507: 

sgsue #- 2 
50a.SH: 

sgsue I 

endcase 

m «- 612 
n 4- b|j 
signed «- b| 4 
if C3 o * 0 then 

wwe 4- K and f0<|) 1 1 0 4 

1 4- c and fc-IJ 

else 

wwe 4- 128 

t 4- C 

endif 

if sgsze < 8 men 

gsi*e 4- 8 
efserf sgsi/e > wwc/2 then 

gsi*e 4- wsuc/2 

else 

gwe 4- sgsue 

endif 

kpize 4- logfgwej 

hvsi/e 4- logfwwe) 

* t*vy*<^-rv4gsM.JM*tt4>3 * 0 then 

rroize 4- ft and fD-t|| 1 1 0 4 

VwtMck 4- t and |M| 

else 

mstte 4- 64*(2-n|*wstfQ'gstfe 
WtAddr 4- t 

end*f 

vsi/e 4- 1 1 +nl*msue'Qsut/w*it 

mm 4- LoadMemoryfcWtMdr^nstfe.order} 

h 4- (2'gwef ♦ 7 • igsue 

Imsize 4— logfmsize) 

i (WtAddrfcroa^o * 0 then 

raise /VcessOnaiowedByWtiialAddress 

endif 

case op of 

WMULMATXa 
order 4- B 

order 4- L 

endcase 

ms 4— signed 

ds 4- signed * m 

as 4- signed or m 

spos 4- (be^o) and |2 # gsue-1| 

dpos 4- {0 1 1 523.I6J and fgsae-1) 

r 4- spos 

stae 4- (0 1 1 b3i.74j and fgsite-1) 
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tfwe 4- (stale » Q| or gsta2e»dpos) > gsue| ? gsite-dpas : stale 
tee «- Jtfve ♦ spot > ft| ? h • spas : tfsiie 
* *i0-9 « * -*gned Men 
md «- F 

md «- bro.9 

end* 

tor 1 4— 0 to wsoe-gsize by gstfe 
q|0) 4- 0^9*^ 7 ^«*» 
for j «- 0 to vwe^gsoe by gsue 
if n then 

* f-< & j & gsiie » 0 then 

k 4- Hf^uehwstte'jajgi^i 

q(^gsi2e| qjjj ♦ mu^gsiwAmimrTtk.cKdJ 

k 4- ♦♦gstferww'jsjgaie*! 

q^^sue) 4- QljJ - mulfgsi*e.h.msjnrnMtdJ 

end* 

else 

qft+gsoef qQJ ♦ muHgs^Anu r nmKj\vsi^9Stt^ ds.dj| 

endtf 
endtor 
p«- qf I28J 
use md of 



none, N: 






s «- 


&*+ 1 1 -pr 1 1 pf 


Z: 








s «- 




F: 








s «- 




C 








s 


<A r 11 I' 


endcase 







v «- Ods & ph.,11 ipf * (01 1 D 
* Nhf^w* » (as 4 v f fji W .|) r »* u - te «J or not I then 
w «- (as & v r ^B W .||9"»^»»-<*P« 1 1 f 1 1 

the 

W *- |S ? (v h II -ygwdP 01 -') • igsiw-dpojj , , QdpOi 

encftf 
endfor 

Rec/*/r*efra. 128. a| 

■ n ft rl^ 

Exceptions 

Accra dwalowtd by virtual addira 
Accra efctattowed by tag 
Accra dnaBowed by global TB 
Accra dnattowed by local 111 
.\ccra deud requutd by tag 
.Accra detail icquned by local TB 
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*W» Md^ Mum Hum 

Loot IB MM 
UoMTBnw 



/ 
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Wide Multiply Matrix Extract Immediate 

ll)oc imrrurttum take an nklmi fum% a general register tii fetch a larpr operand fr<nn 
memory. 4 sconvJ « »jx rand frnm a genera! renter, pert«irm a ^mp «f opera nom <*i 
partita in* ot hits in the operand*, and catenate the results together, placing the result in a 
general register 



W.MUL.MATXf.8.CB 


I XKtr multiply nMru c*OMl irimrdktr ugnrd Dytm 1)19 rndw*n r r*ng 


W.MUL MAT* 1.8.CI 


A Or muApf) n«ru nm^rf mmrdi.tr ugnrd OyTrv tfOr rndMo rr*ng 


W MUL MATXI.8.F.B 


mumpiy nvtrt« rxn.vr r¥w>0.tr ugnrd Bytrt f*9- md»*n ftrwr 


W MUL V \TXL8.FL 


\Aidr mu**piy ntr^rf mmcdwwr ugnrd hytrt MOr mr**n ftnoi 


W.MULMATXI.8N.8 


Wide muoply nwu mtr«* r iwrnMr ugnrd Dytrt r*g m.San nc^rrtt 


Vl.MUL.MATXI8.NL 


a*fr mutopfy ffWKrn r«tr«*rf nmrrMr ugnrd Dytrx brer ftxfcw ncjwrtt 


W MUL.MATXI.8.Z.B 


aor nxjtply m^ru c*j.vt #nmrd**r ugnrd bytm teg mdwn mo 


WMUL.MATXI.8.Z.L 


Xk* muJftply nwfra rstr*rt nmedrffr ugnrd bytci Mttr rndim mo 


W.MULMATXI.16.CB 


Widr multiply nvtru ntfr^f nmrtMr ugnrd dauttm e*g mdwn rr+rvj 


WM JL.MATXf.16.CL 


A«dr multiply m*t/tx rvtr^rr nmftktr vgnrd douMrn imtr rndv*n rrtfmg 


W.MUL.MATXI. 16F.B 


Aidr multiply nMra rxtr«wt gnmrcUMr ugnrd dnubJrft tag rndMo fkwr 


W.MULMATXI.I6.F1 


Vfctdr rnuRyHy nwtru rstrxf mmrcktf ugnrd doutxrn torOr mdwn ftonr 


WMUL.MATXI.16.N.B 


a»dr multiply nwen« mr art rrvnrdi^fc ugnrd douttm big mdon nr.vru 


W.MUL.MATXI. 16.N.L 


Vt^dr muftipfy mMru rxtMrf jnmrdutr ugnrd dr^fiim H»r cndwwi nr.vrv 


WMUL.MATXM6.ZB 


VL'KSr mui&piy nwern rmrart tfnmrdwtr ugnrf douMrtt Og md*vi mo 


W.MUL.MATXI, 6.ZL 


*idr multiply rrvwru ratrart rvnrdrtr • yird dmrfum hfrtr mdMn wo 


W.MJL.MATXI.32.C B 


Ardr muftpiy mrtna ratr*r ***--xJ».<r ugnrd gu**m t>g <txkm rnhng 


W.MUL.MATXI.32 CL 


Widr rm^Dpfy m^r.-i. rxtr^rt «nmrtlMrr ugnrd iuM*rn fcrar mhw rnbng 


W.MUL MATXi.32.F.B 


vAtatr muRipfy nvcrix rarr«v? mmrxWr vgnrd gu*c»m ti«j rndtwi floor 


W.MULMATXI.32.F.L 


aw trumiftf rw*tru rnr* r nmolktr ugnrd qifcVUrn kfflr rrm»^o fiorw 


WMUL.MATXI.32.NB 


Aidr multiply nwffui ratr^rr nmrdi.*r ugnrd gui«Hrn htg rndhVi nr.vcnl 


WMUL.MATXI32.N.L 


a«dr multiply rrv*lri« r«rr,irt mmcd».*c ugnrd <>*.*dim fcmr rnd*.»o nr.vmr 


W MUL.MATXI.32.Z.B 


*'»dr mulbpfy mMrui run^rf mmrru.*c ugnM rui-vUrf rug mdMn mo 


W MUL.MATX1.32.Z.L 


Widr muAply nwcn« r«tr«vr rvnrtMf ugnrd gi^w«rrA nmr mdkwi mo 


W.MUL.MATXI.64.C.B 


Ante mm^#y miHru rjdr<vr mmrtWr ugnrd orrtrn rxj rmkr rrtftng 


WMULMATX/.64.C.L 


Midr mufOpiy rrvcrm rwfr^rr mmrrMf ugnrd orom umr rrvU.%n rr*og 


W.MUL MATX 1 64 F.B 


X«dr mumply m^rru rxtr^rt mnwyMr ugrir«1 or flrtr ivj rnd^n fmr* 


W.MULMATXI.64.F.L 


AkV multiply mMrijt r«tr«vr gnmrttiAtr \*gnrd ncflr*\ wrir mAwi fine* 


W.MUL.MATXI.64.N.B 


Aidr muApfy nwrrii rxrr^rt mmrd*.wr vgnrd ortlrM f*g rnrlkvi or.wru | 


WMUL.MAT JCJ.64.NL 


Jlidr mufOpiy mj^Tit rgttMt *yymr<k.*c ugnrd or Orf\ lirtir rndfatrt nr.wru 


W.MUL.MATXI.64:Z B 


.Vidr mumpiy rvrrn nm.vr nmnMr ugnrd nrfim nig rrvlwn mo 


W.MUL.MATXI.64.Z.L 


'^■d^ mumply nvrrn rirr.vr rwfWr ugnrd ornm nmr mkv> mo 


W.MUL MATXI.C 8 CB 


Vl'idr mufUpiy mj#r« mctr«*rf vrvnoktff romplf Mytm rnrlMn < m* vj 


7/ MUL MATX,.C8 C L 


'A'idr munply ov*n« r«tr««r rwrwYMr rnmpri hytn hinr rnrlhvi rmng 


W.MUL. MATXLC8.F.B 


A'trfr nn/ip^ n\MriM r«tr«vt gnmrdwwr fomptra nyVr\ nig rmthVi ftna 


W MUL MATX.I.C.8.F.L 


Widr rr+gtipty rrymtM rtttfjm l nintrrk.+r rnmptrm nytr-i bfttr mdfc^n floor 


W MUL MATX/ C.8.N.B 


A'idr muinpiv mtri* ntr^rf rwnoMr romptrn oytr^ Omj rndvv> nr.wmt 


WMULMAT.Xf.C8NL 


AVIr rmAply ffvtru rxrr.vt mmMitr romptrt nytm «nr mrkm nr.rru 


WMUL.MATXI.C.8ZB 


Aidr mu|r(p#y ovrru rm<vr fnmr»fc*r romplrs nytm lug rnrKm mo 


|wMULMATXI.C8ZL 


Atdr nugtffy m>»»fi* rmjm . rvnnfctr romplra nytm MT|r mfcn mo j 
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WMULMATXI.C I6.C.S 


WtOr rmJbpfy rrvfra aOTMt gwmrdMer complex Vutrirn teg-rndun erring 


W.MUL.MATXI.C I6.C.L 


WMr n*ribf*y mh earjrt jwnwifr rompm dOLfrirft •ear-mdwi erring 


W.MULMATXI.C.I6.F.B 


*«sr mubftfy Mi oar mi inedtffc campm douttrn teg-rndun Aoor 


W.MUL.MATXI.C.I6.F.L 


w**r rmJDpy mMrw enrjrt mmtedmt eompm doutxen «ar-enr**n floor 


WMULMATXI.C 16MB 


X'tdr muftyriy 111*0 1* ertMrt ftvncvfctfe conjoin doUDteB teg-mcMr> ncjfm 


W.MUL.MATXI.CI6.NL 


Widr rmribpy frjHta artrjrt «vwnedMBr rompm dourirw icar-endtfn nejrrtf 


WMULMATXI.C I6ZB 


W^r muopfy m*ra rjdrjrr »nmrd»*rr eompm doutrirrt b^mdun mo 


W MUL MATXI.C I6.Z.L 


X'«Sr mufopfy (Mru rxtMrt mmedutr complex oaudtm MS? endwn mo 


W.MUL MATXI.C32.C.B 


\*«Sr n**?*y nvwn* ejrtr*rr iwnnMt rompm ouadteti b^ end«n erring 


W.MUL.MATXI.C32.CL 


Aidr mufOpfy mMru rxtMrr mmrtMr rompm rjuffftrti Mttf-end^n erring 


W.MUL.MATXI.C. 32.F.8 


\^K>r mtri^pty ninna adrnrf mmrduer constat rjufldfcu b^ mdUn floor 


WMULMAT.XI.C.32.F.L 


*<Jr ntubpry nutf* cxfir*rr nmr<Mf camrrim Quadfcrv WOr cndwri floor 


W.MUL.MATXI.C.32.N.B 


AkJt rmj^riy m«cru ndrart nm«Mr rompm QuAdtet* btg rndan newn* 


W.MULMATXI.C32.N.L 


A'or rmrir<*y nvrira rxtr*rf mmrrMr rompm quwScn MOe-eno^n neamr 


W.MULMATXI.C32.Z.B 


A'or n*M4*y nwtrti rwtr*rf mmrdwttr ramp** rjuftdfct* teg mdksr mo 


WMUL.MATXI.C32.2.L 


Xtdr rmritapiy m*o« am^rf #nm«Mf rompm quffSMt MOr-encfcan mo 


W.MUL.MATXI.C.64.C.B 


Aidr mc#^sry nvttu attract rnmrrJute rompm rxom teg mdan erring 


WMULMATXI.C64.CL 


*«Jc rmjupfy nwn ritrarr mmctJwwr compfrx orrm UOr endtan rr*r»g 


W.MULMATX/.C.64F.B 


XOr mURf»y m*tru rrtrart mmrdtarr rompm or»m teg rndtfn floor 


W.fUL.MATXI.C.64,F.L 


JLOr mutely rrvtcru orrr^cf «nmrdwvr rompm orom uror cocwm floor 


W.MUL.MATXI.C.64.N.B 


a^K rm#*jry mam* ntr^rr mm«Mc rompm orttrti t*g*nrJhWi nearer 


W.MUL.MATXI.C64.NL 


VIOr rmjr^ary nvfru «tr*rf rvnro^r comp«r« octirM fcrer endian nc^rrfl 


W.MUL.MATX».C.64^.B 


aor muft^My rtwKria extr*rt nmcoxr rompm orom mttwTi mo 


W.MUL.MATXI.C64.2.L 


Uidr rmjt^fy rrv^erm enrxt mmrcMf rompm orocti MOr endwn mo 


W.MUL.MATXI.M.8C.B 


Wide rmj^pfy mwu nrw^rr *nmcd*tr rmard -ngnrd oytri t^ endun rr+ng 


W.MUL.MATXI.M8.CL 


si^jr mutiny mMrix ortrxt nvnnMf w*<J \*gnrd Oytn MDe -endum ccAng 


W.MUL.MATXI.M.8.F.B 


'Xidr nxjr<3iy nw«r« rtatMX wnmetkMc nurd vgnrd oym t»g crvfexn floor 


W.MUL.MATXI.M.8.F.L 


Aidr rmjr^ry nv*TT.i nrirxf mmrd^tr rmard vgnrd oym am frxwn floor 


W MUL.MATXI.M.8.N B 


A*dr m^r^ry nvxt,* extract mmcxhMr rrurd signed oytn Dig rnd^n ne^re-n 


W.MULMATXI.M.8.N.L 


Vlidr nHApiy nvitru exfrxr tfrvnrdk-ttr murd vgnrd oym M0e endwn ncjwca 


W.MUL.MATXI.M.8.Z.B 


X«dr muff^fy iwu rxirxr mmwJi^f nurd v xxJ oytn tkg endwn mo 


W.MUL.MATXI.M.8.ZL 


Wide mtApry rwtru nrrxt ffnmcr**r nurd vgnrd Oytn kror mdwn zrro 


W.MUL.MATXI.M I6.C.B 


Uidr mu*«*y nvrrm rrtr.vt *rwnrdwrr nurd vgnrd doubJrti hg wdtfn rrAng 


W.MUL.MATXI.M.I6.CL 


Aide mut^sry nvtfrui rxoarr rwvtMr nurd vgnrd dnuoftrti MOr-rndian cr*ng 


W.MUL.MATXI M. I6.F B 


a»dr rm*«3ry nvtrw nrtrxr mmrdtfcr nurd vgnrd douCHen teg rncton floor 


W.MUL.MATXJ.M. 16.F.L 


Widr muK^sfy mjmm cwtrjvr mmrdi^rr n%ard vgnrd doUttrn lifttr rnd^n floor 


W.MUL MATXI.M. 16MB 


a Or rrHjrtpfy nvwu mtr.vt nmctfj.tr nurd vgnrd doUMrn teg rndwn nrarrtf 


W.MUL.MAT XI.M. J 6N.L 


Akv muff<)ry rfw«trii ntr^cT mmrdorr nurd vgnrd doUMrfi war mdwn nrarrtr 


W.MUL. MAT.X.IM. 1 6.2 B 


JLKjr mi^ri,-iry rfwtrut mtr*vt *rwnrdw«r nmrd vgnrd drx<Xri$ teg crtaan mo 


W.MUL.MATXI M.16.ZL 


Aidr muPfjry nvurn ntrxr tfnmrrSwKr nurd vgnrd doiOftrft MQr rnd^n mo 


W.MUL.MATXI.M.32.C.B 


VAidr rmjr^fy nvwn nrfr^rr *Twnrdw«r nurd vgnrd rju«*rrt teg rndtfn crtfng 


W.MUL.MATXI.M.32.C.L 


JLtdr mu*Jp#y nvrni ortr.vt #nmrdw«r nurd vgnrd oju*dfcfi fcrar mdwi erring 


W.MUL.MATXI.M.32.F.B 


Widr muttony nvwrti rrrr^ct *nr*rd**rr nurd vgnrd quMIrn teg md*n floor 


W.MUL.MATXI.M.32.F.L 


\itdr nxjTf^y nwr/n r*tr*rt #Twnrrtmrr nurd vgnrd quMSrtt «njr mdwn floor 


W.MUL.MAT XI.M.32N B 




WMUL.MATXfM32.N.L 


Wirtr rrxjr^ry nv*rix cmtr^rf mmoMr nurd vgnrd Qj^rn l«lr rnd»n nrwrfl 


W.MUL.MATXI.M. 32.Z.B 


\Aidr muApfy m*tru rmir^ri •nmrdMrr n»«rd vgnrd qu*Ocn teg cntkfln a 0 


W MUL MATX/.M.32.2.L 


a or muff^ry nv^rria ntr^c r mmrdwrr nurd vgnrd quwMrn WOr rnd^n *ro 


W.MUL.MATXI.M.64.C.B 


U/idr mutuary nv%Ou mtr^rt jnmrdMrr mined vgnrd orom teg rndwm . r ng 


W.MUL.MATXI.M.64.CL 


Vlidr nKJTfvy nvitrui mn^t rnmrdurr mard vgnrd otOrn »mr crvlwn erring 
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W-MULMATXlJU.64.NJ. 



W-MULMATXIjU.64i.ft 



ran 



WMUDAATXlbAtb 



unvgned Ey5fSS^Sf35i coftng 



W<MULMATXI.U.iCL 



W.MULMATXI.U.8.F.B 



imgncd by«n b^-ondran loor 



rart 



W AlULMATXf U 8 N B 



ract 



iragnrd by*n t*g cnoton nrwm 



WA^JL MATXI.U8.NL 



f"u*V*y mmrm rmm miNdutr umgnrd byvn K*r rnduvi rmmt 



W.MULMATXI.UJ6.CB 



unygnrt doublets 04 <n<Mn cr+nq 



W.MULMATXI.U. 1 6.CI 



v*»dr mu*>pty mairu attract i 



WMUcMATXI.U. ! 6.F.B 



iaM^nc<f <3uZfc^""o^ -crxfcjn floor 



W.MULMATXJ.U. 1 6.F.L 



umgned douuM Mrcndgn floor 



WAIULMATX? U. 1 6N.B 



Vh* mutt**/ i 



rjrt 



urmgnctJ dU4*rn tag eodwm nrarm 



WMULMATXf U. 1 S.N.L 



srart 



W.MULMATXIU.32.C.B 



**«»«rtr «w etttact mneduce unugnrd ouidfcn tog-endun redng 



W-MULMATXIU.32.CL 



aract 



uttttgncd ouadteti i 



W.MULMATXI.U32.F.B 



ract 



'umgnrd ouaom ^ endun floor 



W.MULMATXJ.U32.F.L 



UAdr mutt**/ m«a wart i 



' cawgncd quadteti KOr endun floor 



W.MULMATXLU.32.IVB 



ract 



umgned auKfteti t*g endwn nrarnt 



W.MULMATXLU.32.IM1 



cadran 



crooned quKttrti ttttr md»n nrarev 



WAtULMATXLU.64.CB 



tract 



tawgncd orttrtt Dig endtim co*ng 



W-MULMATXLU.64.CL 



arm 



unvgned or am Mr cnoton cejng 



^MULMATXLU.64.F.B 



Bran 



unugnrd orttrtt teg-endum floor 



WMULMATXI.U.64.f.L 



wrt anmecMr irrugned or am mtr endun ftoor 



W.MULMATXJ.U.64.N.B 



**»r munp* rrvwui n»*l mwUr imgnrd ordm tea endun ncwvrv 



WMJLMATJOU.64.N.L 



mufep* nutrm n»«i mmnMr imigned orom krtlr cndkwi nrjrrtt 



Forma t 

W.op.size.rndrdsrc.rb.i 

rd=wop$urernd(rc.rbJJ 
31 2423 



OP 



L 



rd 



18 17 



12 11 



rc 



rb 



8 



6 5_43 
"7 



21 0 

2 



/ 
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sz «- togfsizc) - 3 
case op of 

W.MULMATXI. WMJLMATXI.C 

assert size ♦ 6 - togfsfee) * I * size ♦ 6 - logf size) - 3 
sh sfce ♦ 6 - logf size) - I 
W.MULMATXIM WMULMATXI.U; 

assert size ♦ 7 - togfsize) * J * size ♦ 7 iog(size) - 6 
sh «- size ♦ 7 - log/size) - i 

endcase 

Pexripfon 

The contents of register rc is used as a virtual address, and a value of specified size is loaded 
from memory. A second value is the contents of register rb. The values are partitioned into 
groups of operands of the size specified and arc multiplied and summed, or are convolved, 
pnxlucing a Kfiwp of sums. The group of sums is rounded, and limited as specified, yielding 
a group of results, each « »f which is the size specified. The group of results is catenated and 
placed in register rd. 

The wide mulnply-cxrmct immediate- matrix instructions (W.MULMAT.XJ, 
Vt ML LMAT.X I.U, W.MrLMAT.XJ.M, W.MULMAT.X.I.C) perform a partitioned array 
multiply of up to 16384 bits, that is 128x128 bits. The width of the array can be limited to 
128, 64, 32, or 16 bits, but not smaller than twice the group size, by adding one-haif the 
desired size in bytes to the virtual address operand: 8, 4, 2, or I. The array can be limited 
vertically to 128, 64, 32, or 16 bits, but not smaller than twice the group size, by adding one- 
haJf the desired memory operand sue in bytes to the virtual address operand. 

The virtual address must cither be aligned to 2048/gsizc bytes (or 1024/gsize for 
\\ AU'LMAT.X.I.C), or must be the sum of an aligned address and one-half of the size of 
the memory operand in bytes and/or one-half of the size of the result in bytes. An aligned 
address must be an exact multiple of the size expressed in bytes. If the address is not valid an 
"access disallowed by virtual address** exception occurs. 

/ (zero) rounding is not defined for unsigned extract operations, and a Rcscrvcdlnstruction 
exception is raised if attempted. F (floor) rounding will pmpcrly round unsigned results 
d« »wnward. 
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Wide Malm Estmct I 



A wkk-muldpfy^xtr»ct-immcditte-nwtrix ck>uble» instruction (W.MULMATJL1J6 or 
WuMUL.MATXl.UJ6) multiplies memory |m63 m62 m6l ... m2ml mOJ with vector (h g f 
e d c b i|, ykkfe* the products |am7+bm 1 5+cm23+dm3l +em J9+ fm47+|pn55+ hm63 ... 
am2+btn1 O+anl 8+dn^+cm34+ftn42+Km50+hmS8 
tmKbm9^cml7^dm25^cm33^fnv41 +gm49+hm57 

amO+bm8+cml6+dn^4+cm32+ founded and Knitted as specified: 



«ffClfl2**iaa/MMt 



lin 



|rb|12S| 



2555252 



Wide multiply matrix extract immediate doublets 



/ 



2W 
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A wkk-imilapty-matrix-cxtnKt-tmm^^ instruction 

(M.MULMAT.XJ.C.16) multiplies mcmocy (m31 m30 m29 ... m2ml mC] with vector (h g 
f c d c b tj, yielding the products |tm7+bm6+OTl 5+<iml 4+em2^ £m22+gm31 +hm30 ... 
*m2-bm3+cml0-dml 1 +em 1 8-fcn 1 9+gm26-hm27 

tmHbm0^an9^dm8^eml7^fml6^9n2S^hm24 smO-btnl +cro8<lm9+cnil 6-fl 7+gm24- 
hm25|, ruumkd «nd limited ts specified: 




Wide multiply matrix extract immediate complex doublets 

Definition 

def nxJfyzeAviv.i.ws.wj| as 
enddef 

def WideMulbplyextractlmfnediateMatn 
c 4-- RegReadfrc. 64) 
b <~ RegReadfrb. 128) 

case op of 

W.MULMATXJ.8. WMULMATXI.L U/MULMATXI.Ua WMULMATXI.UL 
WMUL MATXIMa WMULMATXIMl : 
* c »gsiie-4 o * 0 then 

raise AccessOisaNowedByMmjalAddrra 

encM 

/ rf c 3 155^3 # 0 then 

wwe |c and (0-cfJ 1 1 u 4 
t «- c and fc-1| 

else 

wwe <~ 128 

t <- c 
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«'«k XMnply Mmb Ratact ha— daw 

et mM 

fcntte «- togfMtiM) 

* Wh'CiKihh n»hh i « 0 then 

mate «- ft and fCMi 1 1 0* 
WtAddr «. t and ft-IJ 

die 

mue 4- I28*wstoe/g»e 
WtAddr «- t 

endf 

vwf msue^gsuc/wwe 
WMULMATXLCA WAULMATXI CI; 
cigg^ o * 0 then 

raise AccessDisaflowedByWtuaiAddress 

end* 

* c 3 Jgueo * 0 then 

w«f 4- fc and (0<M 1 1 0 4 
t 4- c and fc-l| 

else 

wri*e 4- 128 
t 4- c 

endrf 

Kvstfe 4- togfwsuel 

msue 4- ft and fO-Q) 1 1 0 4 
WtAddr 4- t and ft- 1 J 

etse 

m$w 4- 64%vsi*^gsue 
WtAddr 4- t 

end* 

vwe 4- ^mjtffgw^wsTf 

endcase 
case op of 

WMULMATXIA WMULMATJCJU a W.MULMATXI.M.a W.MULMATXI C.B 
order 4- B 

WMULMATJCJI, WJMULMATXI.Ui. W.MUIMATXIML WirfULMATXI.CL: 
order 4- L 

endcase 
case op of 

WMULMATXI.U a W.MULMATXI UL: 
as *~ rrs +* bs 4- 0 

W.MULMATXI-MH. W.MULMATXJML: 

bs 4- 0 

as 4- nis 4- I 

W.MULMATXJA WJWULMATXI1, W.MULMATXI.C B. W MULMATXI.CL 
as 4- ms 4- bs 4- I 

endcase 

m 4- LoadMemxy(cV%tAddrWj/e.order) 
h 4- (2'gstfe) ♦ 7 - fcpue - fms and bs} 
r 4- n • size - sh 
for i 4- 0 to wsite-gsize by quit 

q(0) 4- o 2# 9 u »* 74 Q»» 

for j 4- o to vsve-gsi/e by gsi/e 
case op of 
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WJMULMATXta WAULM^TXU. 
W MUUbW KUi ft WJMULAMTXU/i. 
WJMULftMTXLMft WMJLAMJX1AIX: 

WAIULAMTXICa WJMULMATXI.CL 
If M 4 J & gstoe « 0 ihen 



endcaie 
endfor 
p s- qfvstfef 
case rnd of 



nor 


e. N: 






s «- 




Z: 


s «- 


orw 


F: 


s «- 


o" 


C: 








S 4- 




endcaic 







v- «as & PrvilMpi ♦ (Oils) 

I* * vr^e.t)'*'-'-* 4 * then 

else 

ague-fi J as ? f*„ 1 1 -vR***-') : 

end* 
endfof 

a 1 27 wut «- 0 
RegWntefrd. 128. a) 

■ * 

enooef 
Exception* 

Access dnaBoved by virtual address 
Access disallowed by tag 
Access dfttattowed by global 111 
Access disallowed by local TB 
Access detad required by tag 
Access detail reoutrrd by local TB 
Access detad restored by global TB 
l^ocalTBrmss 
Global TH rmss 
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Wide Multiply Matrix Floating-point 

These instructions take an address from a general register to fetch a large operand from 
memory, ft second operand from a general register, perform a group of operations on 
partitions of bits in the operands, and catenate the results together, placing the result in a 
general register . 



Operation coda 



W.MULMAT.CF.16.B 


Wtdr multiply matrix complex floaong-pemt half big-endian 


W.MULMAT.C.F.I6.L 


Wide multiply matrix complex floating-point half IMeendian 


W.MULMAT.CF32.B 


Wide multiply matrix complex floating point single big-endian 


W.MULMAT.CF32.L 


Wide multiply matrix complex floating-point single fcttfe-endian 


W.MULMAT.CF.64.B 


Wide multiply matrix complex floating-point double big-endian 


W.MULMAT.C.F.64.L 


Wide multiply matrix complex floating-point double little-endian 


W.MULMAT.F.16.B 


Wide multiply matrix floating-point half big-endian 


W.MULMAT.F. 1 6X 


Wide multiply matrix floating-point half little-endian 


W.MULMAT.I 32B 


Wide multiply matrix floating point single big-endian 


W.MULMAT.F.32.L 


Wide multiply matrix floating-point single little-endian 


W.MULMAT.F.64.B 


Wide multiply matrix floating point double big-endian 


W.MULMAT.F.64.L 


Wide multiply matrix floating-point double little-endian 



Format 

M.op-size.order rd=rc,rb 
rd=mopsi2eorder|rc,rb) 

31 2423 1817 12M 6 5 43 0 

\W MINOR. or aer\ rd I rc I rb I sz I op I 
8 6 6 6 2 4 

Pcxription 

The contents of register rc is used as a virtual address, ant! a value of specified size is loaded 
from memory. A second value is the contents of register rb. The values arc partitioned into 
groups of operands of the size specified, llic second values are multiplied with the first 
values, then summed, producing a group of result values. The group of result values is 
catenated and placed in register rd 

The wide multiply- matrix floating-point instructions (W.MULMAT.F, W.MULMAT.C.F) 
perform a partitioned array multiply of up to 16384 bits, that is 128x128 bits. The width of 
the array can be limited to 128, 64. 32 bits, but not smaller than twice the group size, by 
adding one-half the desired size in bytes to the virtual address operand: 8, 4, or 2. The array 
can be limited vertically to 128, 64, 32, or 16 bits, but not smaller than twice the group size, 
by adding one half the desired memory operand size in bytes to the virtual address operand. 
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The virtual address must cither be aligned to 2048/gsizc bytes (or 1024/gsu* for 
W.MULMAT.CF), or must be the sum of an aligned address and one-half of the size of die 
memory operand in bytes and/or one-half of the size of the result in bytes. An aligned 
address must be an exact multiple of the size expressed in bytes. If the address is not valid an 
"access disallowed by virtual address" exception occurs. 

A widc-multiply rnamx-a<>ating point-half instruction (W.MUL.MAT.F) multiplies memory 
|m31 m30 ... ml mOJ with vector |h g f e d c b a], yielding products 
|hm31+gm27*...+hm7+am3 ... hm28*gm24+...+bm4+am0): 




1*9 



r*(1 2S> 

Wide multiply matrix floating-point half 

A wufc-mulriply matrix complex- floating-point-half instruction (W.MUI-.MAT.F) multiplies 
memory |ml5 ml4 ... ml m0| with vector |h g f c d c b aj. yielding products 
fhml4+gml5+...+bm2+am3 ... hml2+gml3+...+bm0+aml -hml3+gml2+...- 
bml+amO|: 
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its 




miaai 

Wide multiply matrix complex floating-point half 

Definition 

def muHMe.vj.wJ as 

rmi «- frnul|F(sire.Vja e .|, < .J.F|$ire.Ws W -i* J . J )| 

cnooci 

def MemoryftaabngPorrtMuf^^ 
c 4- Regteadfrc 64) 
0 4- RegJteadfrU, 128) 
igsre 4- Jogfgsue) 
switch op of 

WAWLMAT.F.I6. W.MULMAT.F.32. W.MULMAT.F.64: 

* *lgsfce-4.0 * 0 then 

raise Acce$$OisallowedByVirtua*Addfes$ 



* *3..lgsiie-3 * 0 then 

wsize K and f0<)| 1 1 0 4 
t 4- c and (c-U 

else 

wsize 4- 128 
t #- c 

endif 

fwsize 4- togfwsize) 

* 4Mrsi2e^4gsiteJwstz«>3 * 0 then 

msize 4- ft and (0-f)J 1 1 0 4 
WtAddr 4- t and ft- 1 J 

else 

msize 4- 128*wwc/gsi/e 
WtAddr 4- t 

>n «djf 

enow 
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vne msfee*a*Q*ttfee 
WAHIUMAT.CF.l6u WMUUVJCJF32. WMULMMJCJM: 
H (j g ii H a # 0 Chen 

raise AccestfXssfcKttd^^ 

endY 

* c ? lgti»»3 * 0 then 

wwe 4- (c and ftX| MO 4 
t 4- c and 

else 

wstfe 4* 128 
t 4- c 

endtf 

* t**w54gure Jwwi»-3 * 0 then 

msize ft and (0«tj| MO 4 
VWtAddr t and ft-?) 

efcc 

msire «- 64*wsi^gstfe 
WtAddr 4- t 

endif 

vsize 4- 2'msWgstf^wsize 

endcase 
case major of 
MANNOftft 

order 4- B 
M-MINOK.L 

order 4- L 

endcase 

m «- LoadMemofyfcWtAddr,mstte.order) 
for i 4- 0 to wstfe-gsize by gsize 
qJOJ.t 4- NULL 

for j 4- 0 to vstfe-gsoe by gsue 
case op of 

W.VULMAT.F.I6. WJUULMATJ.32. W.MULMAT.F.64: 

qfo*gstfe| 4- faddfq(j|. muifgstfe.n\Hwsire # j8 igsi»bj| 
W.MULMAT.C.F. 16. WJUULMAT.CFJZ M.MULMAT.CF.64: 
# & j & gswe « 0 then 

k 4- •*0&gswehws«e*j8 .jgae»i 
qtrgw| 4- faddfqbJ. mu|gswnXbj|| 

ete 

k Kgstfe*wsire*j8 jQsa^i 

qlrgsuej 4- fsubfqOJ, mu<(gstfe.mXbJ} 

endif 

endcase 
endtor 

tqta+lMj 4- qjvsuej 

endfor / 

«i27.wii» «- 0 
RegWttefrd. 128. a| 
enddef 
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ten Mbnd by global TB 
Acc— fcaloa 1 1 by local TB 
Accaai dttarf ieqaard by tig 
■Vcw» dttai it aai»Jby local TB 
Acorn dnai wy iw d by global TB 
Local TB ohm 
Global TB am* 
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Wi de Multiply Matrix Galois 

These mstrucbtms take an address from a general register to fetch a large operand from 
memory, second and third operands from general registers* perform a group of operations 
on partitions of bits in the operands, and catenate the results together, placing the result in a 
general rcgtstci . 



Operation codes 



W.MULMAT.G.B 


Wide multiply matrix Galois big-endian 


W.MULMAT.G.L 


Wide multiply matrix Galois Httfe-endian 



Format 

WMULMATGorder ra=rcrd,rb 
ra=mgmorder(rcrd # rb) 

31 2423 18 17 12 It 65 

|w.MULG.ord«r| rd 1 rc | rb 1 7Z 
8 6 6 6 6 

Description 

The contents of register rc is used as a virtual address, and a value of specified sue is loaded 
from memory. Second and third values are the contents of registers rd and rb. The values are 
partitioned into groups of operands of the size specified. The second values are multiplied as 
polynomials with the first value, producing a result which is reduced to the Galois field 
specified by the third value, producing a group of result values. The group of result value* is 
catenated and placed in register ra. 

The wide mulnph matnx Galois instruction (W.Nfl'LMAT.G) performs a partitioned array 
multiply of up to 16384 bits, that is 128x128 bits. The width of the array can be limited to 
128, 64. 32, or 16 bits, but not smaller than twice the group size of 8 bits, by adding one-half 
the desired size in bytes to the virtual address operand: 8, 4, 2, or 1. The array can be limited 
vertically to 128, (*4, 32, or 16 bits, but not smaller than twice the group size of 8 bits, by 
adding one-half the desired memory operand size in bytes to the virtual address operand 

The virtual address must cither be aligned to 256 bytes, or must be the fum of an aligned 
address and one-half of the size of the memory operand in bytes and/or one half of die size 
of the result in fcytcs An aligned address must be an exact multiple of the size expressed in 
bytes. If the address is not valid an "access disallowed by virtual address** exception occurs. 



0 

3 
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A vkk-muhipty-mtthx-Gtiats instruction (W.MULMAT.G) multiplies memory [m255 
m254 ... ml mO) with vector (ponmlkjihgfedcba), reducing the result modulo 
puiynoottsl |q). yielding products |(pm255+om247+...+bm31+sml5 mod q) 
(ptn2544ro246+...+t«i3(Hmi4modq) ... (pm248+on^240+...+bml6+sm0 mod q)]: 




w ra|12«) • 



Wide multiply matrix Galois 

Definition 

def c «- PolyMuitipVfsizeAb) as 
pfoj 

for k «- o to sae-l 

p(k.|| «_ p(k| " at ? |0*** » I b II O*) : O 2 *** 
endfor 
c ♦- pJjutJ 

■ in rj i< ■# 

cnooct 

def c 4- flolyRes»due<stfe,a,bt as 
p|0|«-a 

far k sue- 1 to 0 by -I 

rtk.lj pffcj - pfOI,^ 7 (O 5 *** 1 1 1 1 1 1 b II 0*1 : 0* #s « 
endfor 

c ♦- pJ«eJuze-i o 
enooci 

def WMeMulbplyGaJo^^ 
d «- RegRead|rd. 128) 
c 4- RegReadfrc, 64J 
b 4- RegReadfrb. 128) 
gsiie «- 8 
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tgstte 4- togfgstee) 
K cyjp**u o # 0 then 

race ^ciOs a towcd B y^rtua^ddrcn 

endif 

if c j # jgstze-J * 0 Chen 

wsite 4- fc and fO<j| II 0* 
t 4- c and fc-IJ 

etse 

wwe 4- 128 
t 4- c 

end* 

Kvsue 4- togfvwzej 

* tai*e»6-l9Slie.J»tiie-3 * 0 thcn 

msffe 4- |t and (O-tfl 1 1 0 4 

WtAddr 4- t and ft-l) 

e«se 

mwe 4- I28*wsue/gsize 
WtAddr 4- t 

end* 

case op of 

WMULMATGA 

order 4- 8 
W.MULMAT.GL 

order 4- L 

endcase 

m 4- LoadMenxxy(cV%tAddr.m$ce,ordef) 
for i 4- 0 to wvze-gstze by gsue 
q|0f #- ()2*9Si» 

f or j 4- 0 to vwe-gsze by gsue 
k 4- ^wsi/e*j8 jg $ j W 

endfor 

ags*e-ii..i «- *^«k*je(gsize.Qh^)^ 
endfor 

«l27. wstfe «- 0 
RegWritefra. 128. a| 
enddef 

Access duaDowtd by virtual address 
Access disallowed by tag 
Access dnaBowed by global TB 
Access duaBowed by local TB 
Access detail rcquued by tag 
Access detail required by local TB 
Access detail required by global TB 
lixalTBmiss 
Global TB miss 
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WMe Switch 

These Mwomm u take an addrcas fm a general register to fetch a huge operand from 
memory, a sccood operand from a general register, per foe in a group of operations on 
pmm otn of bits in the operands, and catena* the itsuhs together, placing the result in s 
general teg y urr ♦ 



Operation CQdCI 



|W5WITCKB 


Wide switch big-endian 


IW5W7TCHJ. 


Wide switch Htfe-endian | 



Forma 

op ra-rcsdlrt> 
ra-opfrcsdrb) 

31 24 23 18 17 12 II 6 5 0 

1 QP I fd 1 rc I rb I r» 1 

8 6 6 6 6 

n rffrintiflp 

The comma of register rc is specifies ss s virtual address and optionally an operand size, 
and a value of specified size is loaded from memory. / 1 second value is the catenated 
consents of registers rd and rb. Eight corresp o nding bits from the memory value arc used to 
select a sun-** resuk bit from die second value, for each corresponding bit position. The 
group of results is catenated and placed in register ra. 

The virtual address must cither be aligned to 128 bytes, or must be the sum of an aligned 
address and one-half of the sue of the memory operand in bytes. An aligned address must 
be an exact multiple of the size expressed in bytes. The size of the memory operand mv <t be 
8, 16. 32, 64, or 128 bytes. If the address is not valid an "access disallowed by virtual 
address 9 * exception occurs. When a size smaller than 128 bits is specified, the high ord r bits 
of the memory operand are replaced with values corresponding to the bit position, so that 
the same memory operand specifies a bit section within symbols of the operand sue, and 
the same operation is performed on each symbol. 

PcfiOttBP 

def Wrt&l^h^dsx^rtxs* 

d 4- JtagAeadr* 128) / 
c «- feglteadtrc. 64) 
o #- ftegfteadfrtx 128) 
* C| o » 0 then 

raise tecetia&k^dD/VktuMAckke** 
eHerf c±jo * 0 then 

WtAddr c and |c-l| 
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w Mm |c and fCXJ 1 1 0 1 

VktM*4-c 

w #- wjtee 4- 128 

end* 

irate «- 8*wste 

case op of 

WSVOTCMBt 

order 4- 8 
WSMTCHli 

order «- L 

endcase 

m *- LaadMemorric.VkxAttjinm.ordtr) 
db *- d M b 
for i +- 0 to 127 

j «~ 0 1 1 i^M ^ 

fc #- my^l lm^%^jl lm$%^l I rrv*^ I lm 3%v ^l Im^s^l Im^l Imj 

* at* 
endfor 

RegWhtefra 12a 4 
enddef 

Exceptions 

Accm a Wiowcd by nrmal addtaat 
Acctm d mBoawcd by tag 
Accw AaaBoajad by global TB 
Access A— low ed ty local TB 
Accett drtasi restated by lag 
Accet s detail mjuued by local TB 
A ecru drtad rrouuvd by global TH 
Ijocal TBimn 
Global TB naM 
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"ITicsc instruction* iak«; an atklrcss from .\ genera! register to fetch a large operand from 
memory, a *ec«»nd operand from a general register, perform a group of operations on 
partitions ot bits in the operands, ami catenate the results together, placing the result in a 
genera! register . 

Operation codfi 



W.TRANSLATE.8.B 


Wide translate bytes big-endian 


W. TRANSLATE. 1 6.B 


Wide translate doublets biq-endian 


WTRANSLATE.32.B 


Wide translate quadlets big-endian 


WTRANSLATE.64B 


Wide translate octlets biq-endian 


W TRANSLATE.8.L 


Wide translate bytes little-endian j 


W.TRANSLATE 161 


Wide translate doublets little-endian 


W.TRANSLATE.32.L 


Wide translate quadlets little-endian 


W.TRANSLATE 64.L 


Wide translate octlets little-endian 



For ma r 

WTRA'tfLATE size order 



rd=rc.rb 



rd=wtransi>tesi2eorder/rc,rb) 
JJ_ 2423 



\» T«AWttA Tt order | 



rd 



1817 

T 



1211 



6S 4 3 



rc 



rb 



1«1 



Description 

"{"he contents of renter rc in used aN a virtual address, and a value- of specified size in loaded 
from memory A seend value in the contents <il 'rrpstcr Hi. !"hc values arc partitioned int.. 
LTMupv ••t'.iperands of a Ni/e specified. I"hc low order l.ytcs of the second ^roup of' values 
an used aN addresses ••» choose enines from one or more tahlcs constructed from the first 
value, producing a j.-roup of values. The K -roup of results is catenated an-' placed in register 
rd. 

H> default, the total width of tallies in 12* hits, and a tot \\ I a hie width of 12*. 64. M, 16 or X 
hits hut not less than the group Ni/e max Ik- specified In- ad -k:* the desired total tahle width 
:n hue* to the specified addrcNv 16. K. 4. 2. or I \\ hen fev er than I2K hits are specified, the 
tahlcs repeat to ftJl the 128 hit width 

1 mc default depth of each tahle in 2V> entries, or in hues ts \2 times the group si/e in hits 
An operation may specify 4. 8. 16. M. 64. I2K or 2V, emrv lahles. In adding one half of the 
memory operand size to the address. '1 ihle index values are masked to ensure that only the 
Np< e:tied portion of the tahle in U Ned. Tahlcs with just 2 entti. s cannot Ik- specified; if 2 rnirv 
tahlc-N ,re devred. tt in recommended to load the entnes into registers ami use (..MI X to 
select the tahle entnes. 
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Failing to initialize the entire tmbh is m potendMl security hole, mm mn instruction in 
with u smmlt-depth tmbh could access tsble entries previously initialised by mn 
instruction with s large-depth tmbh. We could close this hoh either by initimlizing the 
entire tmble, even if extrm cycles mre required, or by mmsking the index bits so thmt 
only the initimlized portion of the tmble is used Initimlizing the entire tmbh with no 
penmlty in cycles could require writing to ms mmny as 128 entries mt once, which is 
quite likely to cmuse circuit complicmtions. Initimlizing the entire tmble with writes to 
only one entry mt m time requires writing 256 cycles, even when the tmbh is smmllc \ 
Mmsking the index bits is the preferred solution. 

Masking the index bits suggests thmt this instruction, for tmbles larger than 256 
entries, may be useful for a general-purpose memory translate function where the 
processor performs enough independent load operations to fill the 128 bits. Thns, the 
16, 32, and 64 bit versions of this function perform equivalent of 8, 4, 2 withdraw, 8, 4, 
or 2 load-indexed and 7, J, or I group-extract instructions. In other words, this 
ins miction can be as powerful as 23, 11, or 5 existing instructions. The 8-hit version is 
a single-cycle operation replacing 47 existing instructions, so these are not as big a 
uin, but nonetheless, this is at least a 50% improvement on a 2-issue processor, even 
uith one-cy cle-per load timing. To make this possible, the default table size would 
become 65536, 2*2 and 2*< for 16, 32 and 64-bit versions of the instruction. 

For the big-endian version of this instruction, in the deBnition below, tlie contents of 
register rb is complemented. This reflects a desire to organize the tmble so thmt the 
louest addressed table entries mre selected when the index is zero. In the logical 
implementation, complementing the index can be avoided by loading (he table 
memory differently f r big-endian and little-endian versions. A consequence of this 
shortcut is that a table loaded by a big-endian translate instruction cannot be used by 
a little-endian translate instruction, and vice-versa. 

Ilu virrjal .uUrns must cither lx- aligned to 441% bytes, or must be the sum of an aliened 
address a;>d one halt of the s:*c of the memory operand in bytes and/or the desired total 
table width in bytes An aligned address must be an exact multiple of the size expressed in 
b\tcs The si/c of the memory operand must Ik a power of two from 4 to 4096 bytes, but 
must Ih at least 4 times the group size and 4 times the total table width If the address is not 
valid an ' access disallowed by virtual address" exception occurs. 



Definition 

def X'JdeTMnslatefop.gsi/e.rd.rc.rbi 
c *- RegRcadfrc. 64) 
b «~ RegRcadlrb \28) 
Igu/e <og(g$i/e) 
tf c igv/e-4,6 * 0 then 

raise AcccssDiwWowedByVirtualAddresj 

cndif 

,f c « igwc-3 * 0 then 

wsi/c «- fc and (0*e|J I I o J 
t «~ c and fc * 1 J 

<Hse 
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wwf #- 128 

t 4- C 

tjg 

Mm «- togfwsi/et 
* Wre-4 j»we-2 * 0 

msue 4- ft and fO-tJJ 1 1 0 4 

WtAddr 4- t and ft- 1| 

els* 

mare «~ ?56*wsi» 
VfrtAddr 4- t 

endif 

case op of 

W.TRANSLATE.B: 

OfCcf 4- B 
VCTRANSLATE.L 

order «- t 

endcase 

m «- LoadMemoryfc.WtAddr v mstze,order) 

vs**e «- rroize/wwe 

h*i*e 4- logfvsize) 

for t 0 to 128-gwe toy gwe 

j 4- llordcrxBj^^-IO^iie.i^ j|%vs*e*i IW 5 Jr e-l o 

a 9we-l*u *- m j*gstee-l..j 
endfor 

«egWhte|rd. 128. a| 
enddef 

Exceptions 

Acce* dttaBourd In virtual addrett 
\cce*5 di tallowed by tag 
Access duallowed In global "IH 
Acces* dtiaflourd In local *I"H 
Accent detail required In tag 
Acer** detail required by local I'M 
Access detail rrqmrrd by global l~B 
lineal TH mm 
Global I B mi« 



/ 
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Memory Management 



This section discusses the caches, the translation mechanisms, the memory interfaces, and 
how the multiprocessor interf ace is used to maintain cache coherence. 

Overview 



The Vxus processor provides for both l<*cal and global virtual addressing, arbitrary page 
sizes, and coherent-cache multiprocessing. The memory management system is designed to 
provide the requirements for implementation of virtual machines as well as virtual memory. 

All facilities of the memory management system are themselves memory mapped, in order to 
provide for the manipulation of these facilities by high-level language, compiled code. 

The translation mechanism is designed to allow full byte- at a- time control of access to the 
virtual address space, with the assistance of fast exception handlers. 

Pnvilcgt levels provide for the secure transition between insecure user code and secure 
system facilities. Instructions execute at a pnvilcgc, specified by a two-bit field in the access 
information. Zero is the least privileged level, and three is the most-privileged level. 

The diagram below sketch: s the basic organization of the memory management system: 



local virtual 
to global 
virtual 



translation 




fwtag 



f 



virtual to 
physical 



7C 



] 



-hit 



memory management organization 



In general terms, the memory management starts fnwn a 1< ;al virtual address. The local 
virtual address is translated to a global virtual address by a LTB (Ijocal Translation Buffer). 
In turn, the global virtual address is translated to a physical address by a GTB (Global 
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Translation Buffer). One of the addresses, a local virtual address, a global virtual address, or 
a physical address, is used to index the cache data and cache tag arrays, and one of the 
addresses is used to check the cache tag array for cache presence. Protection information is 
assembled from the LTB, GTB, and optionally the cache tag, to determine if the access is 
»cgal. 

This form varies somewhat, depending on implementation choices made Because the LTB 
leaves the lower 48 bits of the address alone, indexing of the cache arrays with the local 
virtual address is usually indentical to cache arrays indexed by the gk>bal virtual address. 
I lowevcr, indexing cache arrays by the global virtual address rather than the physical address 
produces a coherence issue if the mapping from global virtual address to physical is many- 
to-one. 

Starring from a local virtual address, the memory management system performs three actions 
in parallel: the low -order bits of the virtual address are used to directly access the data in the 
cache, a low -order bit field is used to access the cache tag. and the high-order bits of the 
virtual address are translated fh>m a kical address space to a global virtual address space. 

Following these three actions, operations vary depending upon the cache implementation. 
The cache tag may contain either a physical address and access control information (a 
physically- tagged cache), or may contain a global virtual address and global protection 
information (a virtually-tagged cache). 

For a physically-tagged cache, the global virtual address is translated to a physical address by 
the GTB, which generates global protection information. The cache tag is checked against 
the physical address, to determine a cache hit. In parallel, the local and global protection 
information is checked. 

For a virtually-tagged cache, the cache tag is checked against the global virtual address, to 
determine a cache hit, and the local and global protection information is checked. If the 
cache misses, the global virtual address is translated to a physical address by the GTB % which 
also generates the glooal protection information. 

Local Translation Buffer 

The 64-bit global virtual address space is global among all tasks. In a multitask environment, 
requirements for a task-local address space arise from operations such as the UNIX "fork" 
function, in which a task is duplicated into parent and child tasks, each now having a unique 
virtual address space. In addition, when switching tasks, access to one task s address space 
must be disabled and another task's access enabled. 

Zeus provides for portions of the address space to be made local to individual tasks, with a 
translation to the global virtual space specified by four 16 bit registers for cachlocal virtual 
space. The registers specify a mask selecting which of the high order 16 address bits are 
checked t > match a particular value, and if they match, a value with which to modify the 
virtual acklress. /xus avoids setting a fixed page size or local address size; these can Ik set by 
*>ftwarc conventions. 
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A local virtual address space is specified by the following: 



field name 


size 


description 


* - 
wn 


16 


mask to sdect fields of local virtual address to 
perform mater) over 


la 


16 


value to perform match with masked local virtual 
address 


b 


16 


value to xor with local virtual address If matched 


to 


16 


local protection field (detailed later) 



local virtual address space specifiers 



Physical address 

11>ere ate ss many ITB as threads, and up to 2 5 (8) entries per LTB. Each entry is 128 bits, 
with the high order 64 bits reserved The physics) address of a LTD entry for thread th, entry 
en, byte b is: 

63 2423 1918 76 43 0 

| FFFF FFFF 0A00 0000 63 24 \ th \ 0 

40 5 12 3 4 



Pefinifon 

def data.Aags «- AccessPhysicdiL7a|paop.wdaca| as 
«~ P*?3 19 

if (en < |l ll o"|| and fm < T) and fpa, a 6 «0) then 
ca:e op of 

daia 4- o 64 1 1 LTBrVTay(th|fen| 

W: 

locafTB|m|Jen| «- wdata 63 0 

endcase 

else 

data 0 

endif 
enddef 

Entry Format 

These 16 bit values arc packed together into a 64-bit LTB entry as follows: 

63 48 47 32 31 16 IS 

I Itn | la 1 tx | Ip" 

16 16 16 16 



The LTB contains a separate context of register sets for each thread, indicated by the th 
index above. A context consists tif one or mote sets of Im/la/lx/lp registers, one set for 
each simultaneously accessible local virtual address range, indicated by the en index above. 
ITiis set of registers is called the "laical TB context," or LTB (Ijocal Translation Buffer) 
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context The effect of this mechanism is to provide die facilities normally attributed to 
teamen eaooa However, in this system there is no extension of the address range, instead, 
segments are local nicknames for portions of the global virtual address space. 

A failure to match a LTB entry results cither in an exception or an access to the global 
virtual address space, depending on privilege level A single bit, selected by the privilege level 
active for the access from a four bit control register field, global access, ga determines die 
result If gapf. is zero (0), the failure causes an exception, if it is one (I), the failure causes the 
address to be directly used as a global virtual address without modification. 

fiKrtwf Arcm fffetft of control rcgliurl 

11 10 9 8 



mm nunc 



till 

Usually, global access is a right conferred to highly privilege levels* so a typical system may 
be configured with gaO and gal dear (0), but ga2 and ga3 set (1). A single low-privilege (0) 
task can be safe))* permitted to have global access, as accesses arc further limited by the rwxg 
privilege fields. A concrete example of this is an emulation task, *hich may use global 
addresses to simulate segmentation, such as an x86 emulation. The emulation task then runs 
as privilege 0, with gaO set, while most user tasks run as privilege 1, with gal clear. Operating 
system tasks then use privilege 2 and 3 to communicate with and control the user tasks, with 
ga2 and ga3 set 

For tasks that have global access disabled at their current privilege level, failure to match a 
LTB entry causes an exception. The exception handler may load an LTB entry and continue 
execution, thus providing access to an arbitrary number of local virtual address ranges. 

When failure to match a LTB entry does not cause an exception, instructions may access any 
region in the local virtual address space, when a LTB entry matches, and may access regions 
in the global virtual address space when no LTB entry matches. This mechanism permits 
privileged code to make judicious use of local virtual address ranges, which simplifies the 
manner in which privileged code may manipulate the contents of a local virtual address 
range <>n behalf of a less-privileged client. Note, however, that under this model, an LTB 
miss does not cause an exception directly, so the use of more local virtual address ranges 
than LTB entries requires more care: the k»cal virtual address ranges should be selected so as 
not to overlap with the global virtual address ranges, and GTB misses to LVA regions must 
be detected and cause the handler to load an LIB entry. 

Each thread has an independent LTB, so that thread: may independently define local 
translation. The size of the LTB for each thread is implementation dependent and defined as 
the LE parameter in the architecture description register. LE is the log of the number of 
entries in the local TB per thread; an implementation may define LE to be a minimum of 0, 
meaning one LTB entry per thread, or a maximum of 3, meaning eight LTB entries per 
thread. For the initial Zeus implementation, each thread has two entries and LE=I. 

A minimum implementation of an LTB context is a single set of Im/la/lx/lp registers per 
thread. However, the need for the LTB to translate both code addresses and data addresses 
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imposes tome hmits on the use of the LTB in such systems. We need to be able to guarantee 
forward progress. With • single LTB set per thread, cither the code or the data must use 
global addresses, or both must use the same local address range, as must the LTB and GTB 
exception handler. To avoid this restriction, the implementation must be raised to two sets 
per thread, at least one for code and one for data, to guarantee forward progress for arbitrary 
use of local addresses in the user code (but still be limited to using global addresses for 
exception handlers). 

A single-set LTB context may be further simplified by reserving the implementation of the 
Im and la registers, setting them to a read-only zero value: Note that in such a configuration, 
only a single I -A region can be implemented. 

63 3231 1615 0 

1 0 I lx 1 IP 1 

16 16 16 

If the largest possible space is reserved for an address space identifier, the virtual address is 
parononcd as shown below. Any of the bits marked as "local" below may be used as 
"offset" as desired. 

63 4847 0 

\ local | offset I 

16 48 

To improve performance, an implementation may perform the LTB translation on the value 
of the base register (rc) or unincremcntcd program counter, provided that a check is 
performed which prohibits changing the unmasked upper 16 bits by the add or increment If 
this optimization is provided and the check fails, an AcccssDisallowcdByVirtuab\ddress 
should be signaled. If this optimization is provided, the architecture description parameter 
LB=1. Otherwise LTB translation is performed on the local address, la, no checking is 
required, and LB=0. 

'ITk LTB protect tic!d controls the minimum privilege level required for each memory action 
of read (r), wnte (w), execute (x), and gateway (g), as well as memory and cache attributes of 
write allocate (wa), detail access (da), strong ordering (so), cache disable (cd), and write 
through (wt). These fields are combined with corresponding bits in the GTB pmtcct field to 
control these attributes for the mapped memory region. 

7 6 5 4 3 2 0 
IpO: f 0 1 0 I 0 I da I so I cc I 



IS 1413 1211 109 8 

Ipl: 1 Q 1 x 1 w I r | 
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WHrf Description 



The incasing of rae 6cUa tie given bjr the niaWtng table 



name 


size 




g 


2 


minimum pnvieqc required for qatewav access 


X 


2 


minimum pnVieoe required for execute access 


w 


2 


minimum pnVfteoe required for write access 


r 


2 


minimum pnVMeqe required for read access 


0 


1 


reserved 


da 


1 


detail access 


so 


1 


stronq ordering . 


cc 


3 


cache control 



Oct galocMPmtect *- LocafTranstsbonftfvba.la^ as 
i LB k fba^ 48 * b>63-48i «nen 

raise AccessOisaio«vedByVirtualAddress 

endrf 

me 4- NONE 

for i «- 0 to (I II 0L*H 

* **6J-4a * -LocafTBfthUiJ^ « Locairefth||i|47_32 then 
me «- • 

ertdV 
endfof 

? me « NONE then 

H ~Controtftegisterpt»£ then 
raise LocafTBMtss 

endaf 
ga *- la 

LocatfVotect 4- 0 

else 

ga (10*3-43 * LocalTB|thl|mel3 k i6l I I va 4 7..o 
Loraffrotect 4- LocaT7B(thI|me|fs..o 

endaf 

■ n i-t i-a 

enoaer 

Global Translation Buffer 

Global virtual addresses which fail to be accessed in either the I^C, the MTB, the BTB, or 
PTB are translated to physical references in a table, here named the "Global Translation 
Buffer," (GTB). 

Each processor may have one or more GTB's, wijrh each GTB shared by one or more 
threads. The parameter GT, the base-two log of the number of threads v hich share a GTB, 
and the parameter T, the number of threads, allow computation of the number of GTBs 
(T/2 CT ), and Ac number of threads which share each GTB (2 CT ). 

If there arc two GTBs and four threads (GT=1, T=4), GTB 0 services references from 
threads 0 and 1, and GTB I services references from threads 2 and 3. 
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In the first impkmcnotiori, there is one GTB, shared by all four threads (GT=2. T=4> The 
GTB has 128 entries (C=7). 

Per dock cyde, each GTB cm translate one global virtual address to a physical address, 
yielding protection information at a side effect 

A GTB miss causes a software trap. This trap is designed to permit a fast handler fix 
GlobalTBMiss to be written in software, by permitting a second GTB miss to occur as an 
exception, rather than a machine check. 

Prated address 

There nrty be as many GTB as threads, and up to 2* 5 entriea per GTB The physical address 
of a GTB entry for thread th, entry en, byte b is: 

43 ; MM »W tL° 

. I FFFF FFFF 0C00 0000 43 .14 1 *M cn I b I 
1 40 S 15 4 

Note that in the diagram above, the low-order GT bits of the th value are ignored, rejecting 
that 2 CT threads share a single GTB. A tingle GTB shared between threads appears 
multiple times in the address space. GTB entries are packed together so that entries in a 
GTB arc consecutive: 



del daft/lags «- Acc«$Ptmica*GTB(pa.op.wdataJ as 
tn 4- pa 2 3_|9*GT I • 0°* 
en pais 4 

|en < |l 1 1 0°H and ftr» < T) and (pais*OT..l9 » °l then 

case op of 
ft 

data «~ GTaVray|th 5 axllc"! 

W: 

GT6Wray|ths..OTll« n l «~ ***** 

endcase 

else 

data <- 0 

enddef 

Entry Format 

Each GTB entry is 128 bits. The format of a GTB entry is: 
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63 




76 0 








7 


127 




7271 64 


1 




1 apt 1 



40 8 

6 5 4 3 2 0 

gpO: I 0 I 0 I da I io I cc 1 

till 3 

71 7069 6867 6665 64 

gpi: I g I x I w I r I 

2 2 2 2 

pfetf PcKriptfon 

gs - ga + size/2: 256 $ size £ 2*4, ga, global address, is aligned (a multiple of) size, 
px = pa A ga. pa, ga, and px are aO aligned (a multiple of) size 



The meaning of the fields are given by the following table: 



name 


size 


meaninq 


OS 


57 


global address with size 1 


P* 


56 


physical xor 


0 


2 


minimum privHeqe required for gateway access 


X 


2 


minimum privilege required for execute access 


w 


2 


minimum privilege required for write access 


r 


2 


minimum privilege required for read access 


0 


1 


reserved 


da 


1 


detail access 


so 


1 


strong ordering 


cc 


3 


cache control 



If the entire contents of the GTB entry is zero (0), the entry will not match any global 
address at all. If a zero value is written, a zero value is read for the GTB entry. Software 
must not write a zero value for the gs field unless the entire entry is a zero value. 

It is an cm* to write GTB entries that multiply match any global address; all GTB entries 
must have unique, non -overlapping coverage of the global address space. I lard ware may 
produce a machine check if such overlapping coverage is detected, or may produce any 
physical address and protection information and continue execution. 

IJmitmx the C7B entry sift to 128 bits allows up to rrptace entries atomicalh (with a sinfle store 
iteration), which is less complex than the previous design, in which the mask portion was first reduced, then 
other entries changed, then the mask is expanded. However, it is limiting the amount of attribute information 
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del pa.GtotoaiProtect «- OotetAddressTranslatKin(tfxgApUda| at 
me *- NONE 

for i 4- 0 to (I II 0°) -I 

if GtobanBftns otIM * 0 then 

we *- |GlobafTB|th 5 . aT J|i| 6 3„7 and (0* 4 -Ctot»ITB|tn5..aTH'l63.7ll » ! 0 s 
* 1(9*63.4 * I0 8 ) * CtobaffB(th5_aTl|il63.jl I0 8 )) and (O^-o/ejJ = 0 then 
me +- Gtobai^B|ihs.ar)l4 

endrf 

endtf 
endfof 

if me * NONE then 
if Ida then 

PCTforrr*VcesjDrt*^Acce»OetailRequ»redflyLocafT^ 

endif 

raise GtotefTBMtss 

else 

P« «- «9363 8 * GtobafTBIths.c T ||me| 127.721 " 9*7 0 

GtotefProtect «- GtobafTBfm 5 gtII"*171 M II 0 1 II GtobafTBfth 5 . oTl(me|6.0 

endif 
enddef 



GTB Registers 

Because the processor contains multiple threads of execution, even when taking virtual 
memory exceptions, it is possible for two threads to nearly simultaneously invoke software 
GTB miss exception handlers for the same memory region. In order to avoid producing 
improper GTB state in such cases, the GTB includes access facilities for indivisibly checking 
and then updating the contents of the GTB as a result of a memory write to specific 
addresses. 

A 128 bit write to the address GTBUpdatcFill (611= 1), as a side effect, causes first a check of 
the global address specified in the data against the GTB. If the global address check results 
in a match, the data is directed to write on the matching entry. If there is no match, the 
address specified by GTBI-ast is used, and GTBI-ast is incremented. If incrementing 
GTBIost results in a zero value, GTBI-ast is reset to GTBFirst, and GTBBump is set Note 
that if the size of the updated value is not equal to the size of the matching entry, the global 
address check may not adequately ensure that no other entries also cover the address range 
of the updated value. The operation is unpredictable if multiple entries match the global 
address. 

ITk GTBUpdatcFill register is a 128-bit memory- mapped location, to which a write 
operation pcrformes the operation defined above. A read operation returns a zero value. The 
format of the GTBUpdateFill register is identical to that of a GTB entry. 
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An khcniaovc write address, GTBUpdate, (6D=0) updues a matching entry, but makes no 
change so the GTB if no entry matches. This operation can be used to indnrisibly update a 
GTB entry as » proatcboo or physical mUresa mformaooo. 

QaSfiDttfiD 

de# GTBUpdateWNe4mJKdata| as 
me «- NONE 
lor I «- 0 to (1 1 1 0°| -I 

size «- (GlooalTBttos.OTlM63.7 ** ft^-GtobanBfths otW»I63.7» ' • 0 s 
* K«a*3-a» 10*1 * |OobafTB|th s _aTlW63.J« «0*» and lO^ sttd - 0 then 
me 4- i 

end* 
endfor 

a* me « NONE men 

GaobafTBJths. orl|GTBtast|m$.«Tll data 
GTBLatttms oTl «- fGTBLaflJths. otJ ♦ »lo-l_0 
* GTBLasttms .oTl .» 0 *w 

GTBtasdms. OTJ *~ GTBFrjtJth 5 orl 

GTSBumpftfij or) «- I 

end* 




GtooafTBJth 5 oTllmei «- data 

endkf 
■ ■■ «< .a ^ 

Physfcal address 

There may be as many GTB as threads, and up to 2 11 registers per GTB (5 registers are 
implemented). The physical address of a GTB control register for thread th, register tn, byte 
bis: 

63 2423 1918 87 43 0 

| FFFF FFFF 0P00 0000 63 .24 MM rn I 0 I b 1 
1 40 S'^ IT 4 4 

Note that in the diagram above, the low-order GT bits of the th value are ignored, reflecting 
that 2 GT threads share single GTB registers. A single net of GTB registers shared between 
threads appears multiple times in the address space, and manipulates the GTB of the threads 
with which the registers are associated 

The GTBUpdate register is a 128-bit memory-mapped location, to which a write operation 
pcrformes the operation defined above. A read operation returns a zero value. The format of 
the GTBUpdate Fill register is identical to that of a GTB entry. 

The registers GTB I -as t, GTBFirst, and GTB Bump are memory mapped. The GTBI-ast and 
GTBFtrst registers arc G bits wide, and the GTBBump register is one bit.: 
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0 


1 


o ■ 


GTSLast 


1 






4 




63 




aa-t 


0 


1 

1 


0 


1 OTBFIrst 


| 




46 


O 




63 




1 0 






0 








63 


1 





def data.ftags 4- fecessPhysKJlGTBaegtste^ as 
™ «- pa, a j 

f (m < 5) and (th < TJ and tpai*xm9 ■ 0) and (p*7 4 * 0| then 
case m 1 1 op of 
0 1 1 R. I 1 1 ft 

data 4- 0 
0 II W, I II W: 

GTBMpdaceOY«eflh^X).wdacat 
2 11* 

data 4- 1 1 GTBLastfths OT | 

2 1 1 W: 

GTBLanjths orl «- wdata<i-|j) 

3 I I R: 

data «- 0 640 1 1 CTBFrstfths .or) 

3 II W: 

CTBFirstflhs .aTl «- wdacao.^0 

3 119: 

data 4- 0 6 * II GTBBumpfth 5 orl 
3 1 1 W: 

GTB8umpfths gjI 4* wdatao 

endcase 

else 

data 4- 0 

end* 
enddef 

Address Generation 

The address units of each of the four threads prt ide up to two global virtual addresses of 
load, store, or mcmorj' instructions, for a total of eight addresses. LTB units associated with 
each thread translate die local addresses into global addresses. The I-ZC operates on global 
addresses. MTB, BTB, and PTB units associated with each thread translate die global 
addresses into physical addresses and cache addresses. (A PTB unit associated with each 
thread produces physical addresses and cache addresses for program counter references. - 
this is optional as by limiting address generation to two per thread, the MTB can be used for 
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program r efe r ences ) Cache addresses arc presented to the LOC as required, and physical 
re checked against cache tags as required. 



Memory tenfr 



The LZC has two banks, each servicing up to four requests. The LOC has eight banks, each 
servicing at most one request 

Assuming random request addresses, the following graph shows the expected rate at which 
requests are serviced by multi-bank/multi-port memories that have 8 total ports and divided 
into 1, 2. 4, or 8 interleaved banks. The LZC is 2 banks each with 4 ports, and the I-OC is 8 
banks, each 1 port 



Bank Arbitration 




|-*-84ar* 1-portLOC 4-bar* 2-port "Tbank 4-porl L2C - t-bank 8-port | 

Note a small difference between applying 12 references versus 8 references for the I-OC (6.5 
vs 5.2), and for the M.C (7.8 vs. 6.9). This suggests that simplifying the system to produce 
two address per thread (pmgram+load/storc or two load/store) will not overly hurt 
performance. A closer simulation, taking into account the sequential nature of the program 
and k*d/storc traffic may well yield better numbers, as threads will tend to line up in non- 
interfering patterns, and program micmcaching reduces program fetching. 
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The fallowing graph shows die rates for both 8 total ports and 16 total ports. 




0 1 2 3 4 S f 7 • 9 tO 11 12 IS 14 IS It 



1+oit LOG -*-44»*2*ort ♦ 24ar**portL2C -*-1*«* Sport 
^jtjgfcljgort ^»*g**g* -+-4-fc«**port -^jjgnfcSgort 1-ta«r* Ifrgort | 

Note significant differences between 8-port systems and 16-port systems, even when used 
with a maximum of 8 applied references. In particular, a 16- bank 1 -port system is better than 
s 4- bank 2-port system with more than 6 applied references. Current layout estimates would 
require about a 14% area increase (assuming no savings from smaller/simpler sense amps) to 
switch to a 16 port IJOC, with a 22% increase in 8* reference throughput 

Program Microcache 

A program microcache (PMC) which holds only program code for each thread may 
optionally exist, and docs exist for the initial implementation. The program microcache is 
flushed by reset, or by executing a B.BARRIER instruction. The program microcache is 
always clean, and is not snooped by writes or otherwise kept coherent, except by flushing as 
indicated above. The microcache is not altered by writing to the LTD or GTB, and software 
must execute a B.BARRIER instruction before expecting the new contents of d LTB or 
GTB to affect determination of PMC hit or miss status on program fetches. 
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In Ac initial implementation, die program microcache holds simple loop code. The 
microcachc holds two separately addressed cache lines. Branches or execution beyond this 
region cause the microcachc to be flushed and refilled tt the new address, provided that the 
a d d t caaca ace executable by the current thread. The program microcache uses the B.IIINT 
and BJHNT.I to accelerate fetching of program code when possible. The program 
rokrocache generally functions as a prefetch buffer, except that short forward or backward 
branches within die region covered maintain the contents of the microcachc 

Program fetches into the microcachc are requested on any cycle in which less than two 
load/store addresses are generated by the address unit, unless the microcachc is ahead)- full. 
System arbitration logic should give program fetches lower priority than load/ store 
references when first presented, then equal priority if the fetch fails arbitration a certain 
number of times. The delay until program fetches have equal priority should be based on the 
expected time die program fetch data will be executed; it may be as small as a single cycle, or 
greater for fetches which are far ahead of the execution point 

Wide Microcache 

A wide microcache (WMC) which holds only data fetched for wide (W) instructions may 
optionally exist, and does exist for the initial implementation, for each unit which 
implements one or more wide (W) instructions. 

The wide (W) instructions each operate on a block of data fetched from memory and the 
contents of one or more registers, producing a result in a register. Generally, the amount of 
data in the block exceeds the maximum amount of data that die memory system can supply 
•n a single cycle, so caching die memory data is of particular importance. All the wide (W) 
instructions require that the memory data be located at an aligned address, an address that is 
a multiple of the size of the memory data, which is always a power of two. 

The wide (W) instructions are performed by functional units which normally perform 
execute or "back -end" instructions, though the loading of the memory data requires use of 
the access or "front-end" functional units. To minimize the use of the "front-end" 
functional units, special rules arc used to maintain the coherence of a wide microcachc 
(WMQ. 

Execution of a wide (W) instruction has a residual effect of loading the specified memory 
data into a wide microcachc (WMQ. Under certain conditions, a future wide (W) instruction 
may be able to reuse the WMC contents. 

First of all, any store or cache coherency action on the physical addresses referenced by the 
WMC will invalidate the contents. The minimum translation unit of the virtual memory 
system, 256 bytes, defines the number of physical address blocks which must be checked by 
any s**e. A WMC for the W.TABI-E instruction may be as large as 4096 bytes, and so 
requires as many as 16 such physical address blocks to be checked for each WMC entry. A 
WMC for the W.SWITCII or W.MUL* instructions need check only one address block for 
each WMC entry, as the maximum size is 128 bytes. 
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By making these checks on the physics! addresses, we do not need to be concerned about 
changes so the virtual memory mapping from virtual to physical addresses, and the virtual 
memory state can be freely changed without invalidating any WMC. 

Absent any of the above changes, the WMC is only valid if it contains the contents relevant 
to the current wiue (W) instruction. To check this with minimal use of ihe front-end units, 
each WMC entry contains a first tag with the thread and address regater for which it was last 
used If the current wide (W) instruction uses the same thread and address register, it may 
proceed safely. Any intervening writes to that address register by that thread invalidates the 
WMC thread and address register tag. 

If the above test fails, the front-end is used to fetch the address register and check its 
contents against s second WMC tag, with the physical addresses for which it was last used. If 
the tag matches, it may proceed safely. As detailed above, any intervening stores or cache 
coherency action by any thread to the physical addresses invalidates die WMC entry. 

If birth the above tests fail for aD relevant WMC entries, there is no alternative but to load 
the data from the virtual memory system into the WMC. The front-end units are responsible 
for generating the necessary addresses to the virtual memory system to fetch the entire data 
block into a WMC 

For the first implementation, it is anticipated that there be eight WMC entries for each of the 
two X units (for WSWITCH irotructkms), eight WMC entries for each of the two E units 
(for Vt .ML L instructions), and four WMC entries for the single T unit. The total number of 
WMC address tags requires is 8*2M+8'2*l+4MM6 = % entries. 

The number of WMC address tags can be substantially reduced to 32+4=36 entries by 
making an implementation restriction requiring that a single translation block be used to 
translate the data address of W.TABLE instructions. With this restriction, each W.TABLE 
WMC entry uses a contiguous and aLgned physical data memory bkxrk, for which a simHc 
address tag can contain the relevant information. The size of such a block is a maximum of 
40% bytes. The restriction can be checked by examining the size field of the referenced 
GTB entry. 

Level Zero Cache 

The innermost cache level here named the "Ixvcl Zcto ache," (I-7-Q is folly associative 
and indexed by global address. Entries in the F.ZC contain global addresses and previously 
fetched data fn*n the mcmoiy system. The I-ZC is an implementation feature, not visible to 
the Zeus architecture. 

Entries in the 1-7.C are also used to hold the global addresses of store instructions that have 
been issued, but not yet completed in the memory system. The entry may also contain 
the data associated with the global address, as maintained either before or after updating 
with the store data. When it contains the post-store data, results of stores may be forwarded 
directly to the requested reference. 
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\\ uh an IJ/.C !ut. data is returned from the \ J.C data, and protection from the I.ZC tag. No 
I .CM? access is required t«» complete the reference. 

All l«uds and pn>gram fetches air cheeked against the \y.C fur conflicts with entries being 
used as store buffer. <>n a L/C hit i»n such entries, if the post store data is present, data may 
Ik returned by the \JA\ t • sansfy the had or pn>gram fetch. If the post store data is not 
present, the hud .* program fetch must stall until the data is available. 

W ith an I.ZC miss, a victim entry is selected, and if dim, the victim entry is written to th' 
l.t >C The I.CX; cache is accessed, and a valid entry is constructed from data from the 
I A HJ ind tags from the I XXI pn section mformanon. 

\!l stores are checked ag-imst the \ 'J£ for conflicts, and further cause a new entry in the 
\JA\ % or "rake over" a previously clean 1J/X] entry for this purpose. Unaligned stores may 
.require two entries in the \JXl At rim • of allocation, the address is filled in. 

Two operations then occur in parallel . 1) for write-back cached references, the remaining 
byte« of the hexlet are loaded from the I XXI (or l-ZQ, and 2) the addressed bytes arc filled 
m with data from data path. If an exception causes the store to be purged before retirement, 
the ly.C entry is marked invalid, and not wnfen back. When the store is retired, the \JA\ 
entn can Ik written back to IX HI or external interface. 

Structure 

"Ihe eight memory addresses are partitioned into up to four odd addresses, and four even 
addresses. 

The MAI contains U* fully associanve entnes that may each contain a single hexlet of data at 
e\en hexlet addresses (I.XCK). and another 16 entnes for odd hexlet addresses (I JAVA )). Ihe 
maximum capacin of the \JA\ is Id* *2=5I2 bvtes. 

Ihe tags for these entnes are indexed by global virtual address (6.V.S), and contain access 
control information, detailed below. 

Ihe address of entries accessed asv relatively . also encoded into binary and provided as 
output from the tags tor use in updating the \JX . % through its wnte ports. 



* bit rwxg 
U. bit vahd 

/ 

b:t din\ 
4 b:r address 
l'> b;t protection 
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def data.protectvalkidirty.match 4- Le*e£eroCacheReadfgaf as 
eo <~ 9*4 
match 4- NONE 

for 1 4- 0 to LevHZeroCache£ntney7-l 
* 19*63 5 * LevHZeroTag|eoJ|i| then 
match «- 1 

endff 
endfor 

if match » NONE then 

rat* LevelZeroCacheMiss 

else 

data «- Leve«ZeroOata|eo|(match|i77 0 
vabd — LevrlZeroOata|e?o||match|,43 , 28 
dirty 4- LevefZeroOataIeo)|matchIt S9 144 
protect LevelZeroQata|eo||match|, 6 7 l60 

endif. 
enddef 

Level One Cache 

The next cache level, here named the 'Ixvcl <>nc Cache/' (IX >C) is four-sct-associarivc and 
indexed by the physical address. The eight memory addresses arc partitioned into up to eight 
addresses for each of eight independent memory banks. The I.()C has a cache block size of 
256 bytes, * irh tnclct (32 byte) sub-blocks. 

ITic IX X, may be partitioned into two sections, one part used as a cache, and the remainder 
used as "niche memory " Niche memory is at least as fast as cache memory, but unlike 
cache, never misses to mam memory. Niche memory may be placed at any virtual address, 
and has physical addresses fixed in the memory map. Thr nl field in the control register 
configures the partitioning of !.( >C into cache memory and niche memory. 

The IX >C data memory is (256+8)x4x(128~2) bits, depth to hold 256 entries in each of foui 
sets, each entry consisnng of one hexlet of data (128 bits), one bit of parity, and one spare 
bit. The additional 8 entries in each of four sets hold the UK tags, with 128 bits per entry 
tor 1 /8 of the total cache, using 512 bytes per data memory and 4K bytes total. 

There arc 128 cache bl«<ks per set, or 512 cache blocks total. The maximum capacity of the 
IX X; is 128k bytes, t'sed as a cache, the IX >C is partitioned into 4 sets, each 32k bytes. 
Physically, the IX fC is parti tioncd into 8 interleaved physical blocks, each holding 16k bytes. 

Thi physical address pa*3.X> i* partinoned as below into a 52 to 54 bit tag (three to five bits 
are duplicated from the following field to accomodate use of portion of the cache as 
niche). 8 bit address to the memory bank (7 bits arc physical address (pa), 1 bit is virtual 
address (v)), 3 bit memory bank select (bn), and 4-bii oytc address (bt). All access to the 
IX X: are in units of 128 bits (hexlcts), so the 4 bit L ,te addrc- (bt) docs not applv here, llic 
shaded field (pa,v) is translated via nl to a cache iiknrific- (ci) and set identifier (si) and 
presented to the IX Kl as the IX XJ address to IXXZ bank bn. 
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49 



1514 876 43 0 
7 13 4 



The I XXI tag consists of 64 bits of information, including a 52 to 54-bit tag and other cache 
state information. Only one MTB entry at a time may contain a IXXZ tag. 

W ith 256 byte cache lines, there are 512 cache blocks. At 64 bits per tag, the cache tags 
require 4k bytes of storage. This storage is adjacent to the IX >C data memory itself, using 
physical addresses = 1024.. 1055. Alternatively (see detailed description below), physical 
addresses = 0..31 may be used. 



The format of a IX >C tag entry is shown below. 
63 



121! 



L 



I is I 



52 



12 



II 10 9 



87_ 

jnejj I 



i i 



tv 

8 



The meaning of the fields arc given by the following table: 



name 


size 


meaning 


tag 


52 


physical address tag 


da 


I 


detail access (or physical address bit 1 1) 


vs 


1 


victim select [or physical address bit \0\ 


mesi 


2 


coherency: modified (3), exclusive |2|, shared (1). invalid (0) 


tv 


8 


triclet valid (1) or invalid j[0[ | 



To access the IX >C, a global address is supplied to the Micro-Tag Buffer (MTB), which 
associative!)' looks up the global address into a table holding a subset of the LOG tags. In 
particular, each M"I"B tabic entry* contains the cache index derived from physical address bits 
14.. 8, ci, C bits) and set identifier, si, (2 bits) required to access the IX >C data. F.ach MTB 
table entry also contains the protection information of the IX)C tag. 



W ith an MTB hit, protection information is supplied from the MTB. The MTB supplies the 
resulting cache index (ci, fn #m the MTB), set identifier, ai, (2 bits) and virtual address (bit 7, 
v, from the which are applied to the IX >C data bank selected from bits 6..4 of the I A 
The diagram below shows ihe address presented to IX >C data bank bn. 

10 9 32 1 0 2 0 

address ! 0| cl I si 1 v I bank j bn H 

i ; 2i 3 



W ith an MTB miss, the (iTB (described below) is referenced to obtain a physical address 
and protection information. 
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To select the cache line, a 7-bit niche limit register nl is compared against the value of 
P*14.J from the GTB. If pai4.J<nl, a 7-bit address modifier register am is industve-or'cd 
against pa|4..ft, pnxludng a cache index, cl (Otherwise, paf4 # j is used as ci. Cache lines 
O.nl-1, and cache tags 0..nl-l, are available for use as niche memory. Cache lines nl.,127 and 
cache tags nl..l27 are used as LOC 

cl (P«i4 8<nl) ? _(pai 4 ..8l I ami : P«I4.8 

The address modifier am is (l7-W>g(128 olj | | (^m-nl^ The bt field specifics the least- 
significant bit used for tag, and is (nl<112) ? 12 : 8+log(128-nI): 



nl 


am 


bt | 


0 


0 


12 ! 


1.64 


64 


12 


65.96 


96 


12 


97.. 112 


112 


12 


1 13..I20 


120 


It 


121.124 


124 


10 


125. 126 


126 


9 


127 


127 


8 



Values fur nl in the rangv 1 1 3 12" require more than 52 physical address tag bits in the LOC 
tag and a requisite reduction in I -DC features. Note that the presence of bits 14..10 of the 
physical address in the IX )C tag is a result of the possibility that, with am=64..127, the cache 
index value ci cannot be relied upon to supply bit 14..8. Bits 9..8 can be safely inferred from 
the cache index value ci, so long as nl is in the range 0..124. When nl is in the range 
113. 12", the da bit is used for bit 1 1 of the physical address, so the Tag detail access bit is 
suppressed. When nl is in the range 121 .127, the vs bit is used for bit 10 of the physical 
address, so victim sclccnon is performed without state bits in the LOC tag. When nl is in the 
range 125. 12"\ the set associativity is decreased, so that ail is used for bit 9 of the physical 
address and when nl is 127. sio is used for bit 8 of the physical address. 

Four tags are fetched from the LC XI tags and compared against the PA to determine which 
of the four sets contain the data. 'ITic four tags arc contained in two consecutive banks; they 
may be simultaneously or independently fetched. The diagram below shows the address 
presented to LOC data baoW (cii..o } | sit). 



IC9 54 0 2 1 0 

address jCT] 0 | cl6..2 1 bank]" 

15 5 2 



EDESJE 



/ 

Note that the CT architecture description variable is present in the above address. CT 
desenbes whether dedicated locations exist in the LOC for tags at the next power -of- two 
boundary above the LOC data. *l*hc niche-mapping mechanism can provide the storage for 
the LOC! tags, so the existence of these dedicated tags is optional: If CT=0, addresses at the 
beginning of the L(X2 (0..31 for this implementation) are used for LOC tags, and the nl 
value should be adjusted accordingly by software. 
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The LOC address (ci 1 1 ai) uniquely identifies the cache location, and this LOC address is 
associativcly checked again.'! all MTB entries on changes to the LOC tags, such as by cache 
block replacement, bus snooping, or software modification. Any matching MTB entries arc 
Hushed, even if the MTB entry specifics a different global address * this permits address 
aliasing (the use of a physical address with more than one global address. 

W ith an LOC miss, a victim set is selected (1X)C victim selection is described below), whose 
contents, if any sub-block is modified, is written to the external memory. A new LOC entry 
is constructed with address and protection information from the GTB, and data fetched 
from external memory. 

The diagram below shows the contents of LOC data memory banks 0..7 for addresses 

0..2OP: 



ackfren bank 7 ... bank t bank 0 



0 

I 

2 
3 
4 

s 

6 
7 
8 

9 
10 
II 
12 
13 
14 
IS 

1016 
1017 
1018 


arte 0. nexlet 7. set 0 




kne 0, hexlet l f set 0 


kne 0. hexlet 0. set 0 


kne 0. hexlet IS. set 0 




kne 0. hexlet 9. set 0 


kne 0. hexlet a set 0 


kne a hale* 7. set 1 




kne 0. hexlet 1. set J 


kne a hexlet 0. set 1 


kne 0. hexlet IS. set 1 




kne 0. hexlet 9. set 1 


kne 0. hexlet a set 1 


fane o. hexlet 7. set 2 




kne 0. hexlet 1. set 2 


kne 0. hexlet 0. set 2 


kne 0. hexlet IS. set 2 




kne 0. hexlet 9. set 2 


kne 0. hexlet a set 2 


fane 0. hexlet 7. set 3 




kne a hexlet 1. set 3 


kne 0. hexlet 0. set 3 


kne 0. hexlet IS. set 3 




kne 0. hexlet 9. set i 


kne 0. hexlet a set 3 


kne 1. hexlet 7. set 0 




kne 1. hexlet 1. set 0 


kne 1. hexlet 0. set 0 


kne 1. hexlet IS. set 0 




kne 1. hexlet 9. set 0 


kne 1. hexlet a set 0 


kne 1. hexlet 7. set 1 




kne 1. hexlet 1. set 1 


kne 1. hexlet 0. set 1 


kne I. hexlet IS. set 1 




kne 1. hexlet 9. set 1 


kne 1. hexlet a set 1 


kne 1. hexlet 7. set 2 




kne 1. hexlet 1. set 2 


kne 1. hexlet 0. set 2 


kne 1. hexlet IS. set 2 




kne 1. hexlet 9. set 2 


kne 1. hexlet a set 2 1 


kne 1. hexlet 7. set 3 




kne 1. hexlet 1. set 3 


kne I. hexlet 0. set 3 


kne 1. hexlet IS. set 3 




kne 1. hexlet 9. set 3 


kne 1. hexlet a set 3 










kne 127. hexlet 7. set 0 




kne 127. hexlet 1. set 0 


kne 127. hexlet 0. set 0 


kne 127. hexlet IS. set 0 




kne 127. hexlet 9. set 0 


kne 127. hexlet 8. set 0 


kne 127. hexlet 7. set 1 




kne 127. hexlet 1. set 1 


kne 177. hexlet 0. set 1 


1019 


kne 127. hexlet IS. set 1 




kne 127. hexlet 9. set 1 


kne 177. hexlet a set 1 


1020 


kne 127. hexlet 7. set 7 




kne 127. hexlet 1. set 2 


kne 127. hexlet 0. set 2 


10? 1 


kne 127. hexlet IS. set 2 




kne 127. hexlet 9. set 2 


kne 127. hexlet 8. set 2 


1022 


kne 127. hexier 7. set 3 




kne 127. hexlet 1. set 3 


kne 127. hexlet 0. set 3 


1023 


kne 127. hexlet IS. set 3 




kne 127. hexlet 9. set 3 


kne 127. hexlet a set 3 


1024 


tag kne 3. sets 3 and 2 




tag kne 0. sets 3 and 2 


tag kne 0. sets 1 and 0 


I02S 


tag kne 7. sets 3 and 2 




tag kne 4. sets 3 and 2 


tag kne 4. tecs 1 and 0 












I0SS 


tag kne 127. sets 3 and 2 




tag kne 124. sets 3 and 2 


tag kne 124. sets 1 and 0 


I0S6 


reserved 




reserved 


reserved 


2047 










i reserved 




reserved 


reserved 



/ 
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The following tabic summarizes the state transitions required by the LOC cache: 



cc 


33! 


nesi 


13 


bus op 


3331333 


23 


botes 


MC 


3! 


c 


13 


breached read 
















NC 


n: 




13 


uncached write 
















CD 


m: 




13 


uncached read 
















CD 


mi 




a 


uncached read 
















CD 


m: 


WES 


a 


(hit) 
















CD 


23] 




3 


uncached write 
















CD 


23: 




3 


uncached write 
















CD 


2a: 


W1ES 


a 


uncached write 












1 




WT/WA 


a] 




3 


triclet read 


0 


X 












WT/WA 


a] 




3 


triclet read 


i 


0 


S 


1 








WT/WA 


a] 




3 


bidet read 


i 


1 


E 


1 








WT/WA 


a; 


wIES 


3 


triclet read 


0 


X 










inconsistent KEN# 


WT/WA 


ait 




3 


triclet read 


i 


0 












WT/WA 


mi 




3 


triclet read 


i 


1 










E->S: extra sharing 


WT/WA 


m: 




3 


triclet read 


i 


0 




1 








WT/WA 


m: 




3 


triclet read 


i 


1 


s 


j 






shared block 


WT/WA 


m: 


A 


3 


triclet read 


i 


0 


s 


1 






other subblocks M->l 


WT/WA 


m: 


A 


3 


triclet read 


i 


1 




1 






E->M: extra dirty 


WT/WA 


mi 


AES 


a 


(hit) 
















WT 


23j 




3 


uncached write 
















wr 


2a; 




3 


uncached write 
















WT 


23i 


AES 


a 


uncached write 












1 




WA 


2a: 




3 


triclet read 


0 


X 






1 




throwaway read 


WA 


2a] 




3 


triclet read 


i 


0 


S 




1 






WA 


2a] 




3 


triclet read 


i 


1 


M 










WA 


23i: 


AES 


3 


triclet read 


P 


X 






1 


1 


inconsistent KEN# 


WA 






3 


triclet read 


i 


0 


S 




1 






WA \ 


23£ 




3 


triclet read 


i 


1 


M 










WA ; 


23£ 




a 


write 




0 


S 










WA 


23£ 




a 


write 




1 


S 






1 


E->S: extra shsring 


WA 


23] 




3 


triclet read 


i 


0 


s 




I 






WA 


23] 




3 


triclet read 


i 


1 






1 






WA 


23^ 




a 


(hit) 




X 


M 








E->M; extra dirty 


WA 


232 


A 


3 


triclet read 


i 


0 


M 




1 






WA 


232 


A 


3 


triclet read 


i 


1 


M 










WA j 


232 


A 


a 


(hit) 




X 


M 












■1 




■ 
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CC 


cacne control 


Qp 


operation: Reread. W*write 


mesi 


current mesi state 


V 


current tv state 


bus op 


bus operation j 


c 


cachable ftriclet] 


result 


X 


exclusive result 


mArj 
1TIC3I 


new mesi state 


V 


new tv state 


w 


cacheable write after read 


m 


merge store data with cache line data 


notes 


other notes on transition 



Bsfiastso 

def daca.tda 4- LeveOneCacheAccess(p2isi*eJdag^^ as 
// cache index 

am <- (l74og(l28-nl> , , 0*9(128-*), 

ci «- {pa, 4 e<^ ? fpai4.8> l*nl : pai4.8 

bt 4- fr»JS1 12) ? 12 : 8*tog< 1 28-n/J 

// fetch tags for aM four sets 

taglO <- lteadPhysical(0xFFFFFFFF00000000 6 3.j9l ICT1 I0 5 l Icil I0 1 1 I0 4 128) 
Tag|OJ «- tagl0 63 0 
TagJIJ «- taglO t2 7.64 

tag32 IteadPhysicalfOxFFFFFFFFOOC^^ ICTI I0 5 1 Icil 1 1 1 1 I0 4 . 128) 
TagJ2J «- tag32 63 .. 0 
Tag|3) +- tag32, 2 7 64 

vsc «- fTagJ3J| 0 II Tag|2),o) - fTag[IJ l0 II Tag|01,ol 

// look for matching tag 

si 4- MISS 

for 1 4- 0 to 3 

* fTagfi)63..io 1 1 M O 1 1 O'taAt = pa63 .be then 
Si «- • 

end* 
endfor 

// detail access checking on MISS 
if (si * MISS) and (Ida * gda) then 
if gda then 

Perform*ccessDeta..fAccessDeta.lRe^ 

Pei1ornWcessOet»f{AccessOetatlRequKedByLocarre) 

endif 

endif 

// 4 no matching tag or invaftd MESI or no sub-block, perform cacheable reacywhte 
bd «- (si = MISS) or rragfufej = I) or «op*W) and (Tag|») 9 8 » S)J or -Tag|siI |M/ s 
•f bd then 
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i fof*W) and fee i WAI and tfsi • MSS) or -Tag/ajp^ s or (ragjsfc_8 * S» then 

data,ceaxen «- AccessFhysfcaJ(pajite^ccRO| 

//if cache disabled or shared, do a write through 

if -cen or •xen then 

dataxeaxen 4- *cessPhyska^pajtte.cc.W.wd| 

eochf 

else 

data.cenxen <- AccessFhys*al(paji2e,cc.op,wd) 

endif 
al cen 

else 

al 0 

endif 

// find victim set and eject from cache 
if af and (si * MSS or Tagfsfo j « Q then 
case bt of 
12.11: 

si 4- vsc 

10.8: 

gvsc 4- gvsc ♦ I 

si 4- fbt£9| : pa9 : gvsc r pan 1 1 (btS8| : pa 6 : gvsqfpaio 

endcase 

if Tagfsifo 8 z M then 
for i 4- 0 to 7 

if TagfsiJ, then 

vca 4- 0xfFFFFFFF00O000O0 63 „, 9 t 101 Idl Isil l^o 1 ,o4 
vdata 4- ReadPhystcalfvca. 256) 

vpa fTag|siJ 6 3 to 1 1 Si|..o 1 1 0 7 tattl lp%t-l.jl lfe.ol 101 I0 4 
WritePhysicaHvpa. 2S6. vdata) 

endif 
endfor 

endif 

if Tag|vsc*1|9 4 = I then 

nvsc 4- vsc ♦ I 
Hseif Tag|vsc+2)9 £ = I then 

nvsc «- vsc ♦ 2 
efseif Tag|vsc*3|9.j * I then 

nvsc 4- vsc ♦ 3 

else 

case cc of 

NC. CD. WT. WA PF: 
nvsc 4- vsc ♦ I 
IS. SS: 

nvsc 4- vsc //no change 

endif 
endcase 

/ endif 
tda 4- 0 

srn «- O 7 ^ 9 / s || 1 1 M 

else 

nvsc 4- vsc 

tda #- (boll) ? TagfsiJii : 0 
if al then 
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sm «- Tag(n|7.ifM ? .5 II I 1 " Tagpjpa,, s .|..0 

enow 

// write new data into cache and update victim selection and other tag ffc*fc 
if ai then 

f op>R then 

mesi 4- xen ? E : S 

eise 

mesi 4- xen ? M : I TOOO 

endif 

case bt of 

12: 

TagJsiJ 4- P*tyjbt 1 1 tda 1 1 Tag|sT2|io * nvsc^ 1 1 mesi 1 1 sm 
Tag|si-1|,o 4- Tag|sT3|io m nvsc rsio 

II: 

TaglsiJ 4- pa 63 Jbt 1 1 Tag|$r2Ji 0 * nvsc^ 1 1 mesi 1 1 sm 
Tag|siN|io «- Tag|sr3|| 0 * nvsc r$io 

10: 

Taglst) «- pa 6 3 tx 1 1 mesi 1 1 sm 

endcase 
dt 4- 1 

nca 4- OXFFFFFFFF0000000063..I9I 101 Icil I Si I lpa 7 ..5l >0 4 
WntePhysicaffnca. 256. data) 

endif 

// retrieve data from cache 
if -bd then 

nca 4- 0xFFFFFFFF00000000 63 .i9l «0» 'WJ 1 ,q4 

data 4- ReadPhysicalfnca 128) 

end* 

// write data into cache 

H fopcW) and bd and af then 

nca 4- OxFFFFFFFF00000000 63 ,cl 101 I ci 1 1 si I Ipa/.sl I0 4 

data 4- ReadPhysicaffnca. 128) 

mdata 4- dat3|27.£*|s*e»pa3.0| » • wdr|tte4paJJ»-l..rpa3..0 • • dafcrpaJ..0..0 
WritePhysicalfnca. 128. mdata) 

endif 

// prefetch into cache 

4 aW>d and |cc=PF or cc=LS| then 

af 4- 0 // abort fetch if af becomes 1 

for i 4- 0 to 7 

if -Tag|si|j and -af then 

dataieaxen 4- AccessPhys»cal(pa 6 3.8l 1*2-0' 101 » 0 4 .256.cc.R,Q| 
if cen then 

nca 4- 0xFFFFFFFF00000000 6 3..l9l 101 Icil Isil lip.o" 10 4 
WritePhys#cal(nca. 256. data) 
Tag|si)i 4- I 
dt 4- I 

else 
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af «- 1 

endif 
endfor 

endif 

// cache tag writeback if dirty 
if dt then 

nt «- Tag(si| III 1 ) II Tag|si| 1 10»| 

WntePny$icaltOrfFFFFFFF()0000000 6 3..i9l ICTI lO'l Icil lsi|l 10*. 128. nt) 

endif 
enddef 



Physical address 

The IjCX: data manor)- banks are accessed implicitly by cached memory accesses to any 
physical memory location as shown above. The LOC data memory banks are also accessed 
explicitly by uncached memory accesses to particular physical address ranges. The address 
mapping of these ranges is designed to facilitate use of a contiguous portion of the LOC 
cache as niche memory. 

The physical address of a LOC hcxlct for IjOC address ba, bank bn, byte b is: 

^3 1817 76 43 0 

I FFFF FF FF 0000 0000 « 3 ..i8 I ba M b 1 

1 46 " il 3 4 

Within the explicit LOC data range, starting from a physical address pan..o, the diagram 
below shows the LOC address (p»V..l) presented to LOC data bank (pa*..*). 

10 0 2 0 

address! pai7.7 I bank j p*6..4\ 

1 n 3 
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The diagram below shows the LOC data memory bank and address referenced by byte 
address offsets in the explicit LOC data range. Note that this mapping includes the addresses 
use for LDC tags. 



cm* 






0 


tank & addtea 


"5— 


14 


sank I. adefrn 




32 


Mnfc 2. adcacn 


0 


4a 








MfH 4 f addtess 


0 


ao 


paSTT, adiS ni 


IT" 


94 


sank 4. jdttfo 


0 


112 


ianfc 7. Jdctas 


6 


12a 


aank 0 address 1 


144 


*ank 1. Jddrro 1 


160 


unit 2. ad*en 1 


124 


sunk X adefren 1 


1*2 




20a 


sank 5. addtess 1 


224 


»nfc 4 address 1 


240 


lank 7. ad*ess 1 






242014 




2047 


242012 




7517 


242040 




242044 


unkladcfress 


2047 


242080 


sank 4. adefress 


2047 


24*094 




2047 


242112 


sank 4 adekess 


2047 


242128 




2047 



def data «- AccessPhysacaiLOQpa.op.wd} as 
bank 4- pa* 4 
ad<* pai7^ 7 
case op of 
ft 

rd «- LOOVTay|bank|(addr| 

crc 4- LOCRedundancyfbank| 

data «- (crc and rd 130 .2) or (-crc and rd^ad 

PfOJ 4- 0 

for i 4- 0 to 128 by I 

p|Mj 4- pfij - data, 
endfor 

if Controtf?egtster 6 | and (pf 129J * I) then 
raise CacheError 

endif 



P|0| - 0 

for I 4- 0 to 127 by I 

p|M| - pfij * wd, 
endfor 

vw*I28 «- -PP*8| 

crc 4- LOC Redundancy! bank) 

rdata 4- fc^ 124.0 *nd ^124.01 or (-crc 126.0 *nd wd|28.2) 



-341 - 



MicroL'nity 



Zeus System Architecture Tuc, Aug 17, 1999 Memory Minsgcmcnt 

LrrdOocCadw 

lOCArrayfbankJIaddrf «- vwJ| 21.177 1 1 rdata II wd, 0 

endcase 

ri ri^ 



Level One Cache Stress Control 

I XXI cells may be fabricated with marginal parameters, for which changes in clock timing or 
power supply voltage may cause these LOC cells to fail or pass. When testing the LOC while 
the pan is in a normal circuit environment, rather than a special test environment with 
changeable power supply levels, cells with marginal parameters may not reliably fail testing. 

To combat this problem, two bits of the control register, LOC stress, may be set to stress 
the circuit environment while testing. Under normal operation, these bits are cleared (00), 
while during stress testing, one or more of these bits are set (01, 10. 1 1). Self-testing should 
be performed in each of the environment settings, and die detected failures combined 
together to produce a reliable test for cells with marginal parameters. 

level One Cachg RedMnd?n<y 

The I XXI contains facilities that can be used to avoid minor defects in the I XXI data array. 

Each IX XI bank has three additional bits of data storage for each 128 bits of memory data 
(for a total of 131 bits). One of these bits is used to retain odd parity over the 128 bits of 
memory data, and the other two bit* are spare, which can be pressed into service by setting a 
non-zero value in the I XXI redundancy control register for that bank. 

Each row of a IXXI bank contains 131 bits: 128 bits of memory data, one bit for parity, and 
two spare bits: 

130 129 128 127 0 

I spare | p 1 data I 

2 I 128 



IXXI redundancy control has 129 bits:: 
128 127 

control 

I 128 



Each bit set in the control word causes the corresponding data bit to be selected from a bit 
address increased by two: 

/ 

output «— (data and -control) or ((spareo | J p | | d« ta 127..2) control) 

parity *- (p and -pc) or (sparej and pc) 

The IXXI redundancy control register has 129 bits, but is written with a 128-bit value. To set 
the pc bit in the IXXI redundancy control, a value is written to the control with either bit 
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124 set (1) or bit 126 set (I). To set bit 124 of the IjOC redundancy control, a value is 
written to the control with both bit 124 set (1) and 126 set (I). When the I£>C redundancy 
control register is read, the process is reversed by selecting the pc bit instead of control bit 
124 for the value of bit 124 if control bit 126 is zero (0). 

This system can remove one defective column at an even bit position and one defective 
column at an odd bit position within each IjOC block. For each defective column location, 
x, IJOC control bit must be set at bits x, x+2. x+4, x+6, ... If the defective column is in the 
parity location (bit 128), then set bit 124 only. The following table defines the control bits 
for parity, bit 126 and bit 124: (other control bits are same as values written) 



value 126 


value |24 


PC 


control 126 


control 124 


0 


0 


6 


0 


0 


0 


1 


i 


0 


0 


I 


0 


i 


1 


0 


1 1 


1 


i 


1 


1 



The LOC redundancy controls arc accessed explicitly by uncached memory accesses to 
particular physical address ranges. 

The physical address of a IX)C redundancy control for LOC bank bn, byre b is: 

43 76 43 0 

| FFFF FFFF 0900 0000 6 3..7 M 5 | 

1 57 3 4 



def data «- AccessPhyucslLOC RedLirKJancyf pa.op.wd) as 
bank «- pa 6 4 
case op of 
fe 

rd «- LOCRedundancyJbank] 

data «- rd,27 125I l|rd|26 ? rd l24 : rd|28l Mfd l23 .0 

W: 

rd «- (wd|26 or wd| 2 4,l lwd| 2 7..i2Sl >^l» wd|24H IwdttJ-O 
LOCRedundancy|bank| «- rd 

endcase 
enddef 

Memory Attributes 

Fields in the LTB, GTB and cache tag control various attrihutcs of the memory access in the 
specified region of memory. These include the control of cache consultation, updating 
allocation, prefetching, coherence, ordering, victim selection, detail access, and cache 
prefetching. 
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Cache Control 

The cache may be used in one of five ways, depending on a three-bit cache control field (cc) 
in the LTB and GTB. The cache control field may be set to one of seven states: NC, CD, 
WT, WA, PF, SS, and LS: 

State I read I write i rea4/write~~l 



No Cache 


0 


No 


No 


No 


No 


i No 


No 


Cache Disable 


t 


Yes 


No 


Yes 


No 


No 


No 


Write Through 


2 


Yes 


Yes 


Yes 


No 


No 


No 


reserved 


3 














Write Allocate 


4 


1 Yes 


Yes 


Yes 


Yes 


No 


No 


Prefetch 


5 


Yes 


Yes 


Yes 


Yes 


No 


Yes 


SubStream 


6 


Yes 


Yes 


Yes 


Yes 


Yes 


No 


LineStream 


7 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 



The Zeus processor controls cc as an attribute in the LTB and GTB, thus software may set 
this attribute for certain address ranges and dear it for others. A three-bit field indicates the 
choice of caching, according to the tabic above. The maximum of the three-bit cache control 
field (cc) values of the LTB and GTB indicates the choice of caching, according to the table 
above. 



No Cache 

No Cache (NC) is an attribute that can be set on a LTB or GTB translation region to 
indicate that the cache is to be not to be consulted. No changes to the cache state result 
from reads or writes with this attribute set, (except for accesses that dirccdy address the 
cache via memory-mapped region). 

Cache Disable (CD) is an attribute that can be set on a LTB or GTB translation region to 
indicate that the cache is to be consulted and updated for cache lines which arc already 
present, but no new cache lines or sub-blocks arc to be allocated when the cache docs not 
already contain the addressed memory contents. 

The "Socket 7 M bus also provides a mechanism for supporting chip sets to decide on each 
access whether data is to be cached, using the CACHE* and KEN# signals. Using these 
signals, external hardware may cause a region selected as WT, WA or PF to be treated as 
CD. This mechanism is only active on the first such access to a memory region if caching is 
enabled, as the cache may satisfy subsequent references without a bus transaction. 

Write Through 

Write Thnmgh (WT) is an attribute that can be set on a LTB or GTB translation region to 
indicate that the writes to the cache must also immediately update backing memory. Reads to 
addressed memory that is not present in the cache cause cache lines or sub-blocks to be 
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allocated Writes to addressed memory that is not present in the cache docs not modify 
cache state. 

The "Socket 7" bus also provides a mechanism for supporting chip sets to decide on each 
access whether data is to be written through, using Ac PUT and WB/WT# signals. Using 
these signals, external hardware may cause a region selected as WA or PF to be treated as 
WI . This mechanism is only active on the first write to each region of memory; as on 
subsequent references, if the cache fine is in the Exclusive or Modified state and writeback 
caching is enabled on the first reference, no subsequent bus operation occurs, at least until 
the cache line is flushed. 

Write ^kXflft 

Write allocate (WA) is an attribute that can be set of a LTB or GTB translation region to 
indicate that the priKcssor is to allocate a memory block to the cache when the data is not 
previously present in the cache and the operation to be performed is a store. Reads to 
addressed memory that is not present in the cache cause cache lines or sub-blocks to be 
allocated. For cachcable data, write allocate is generally the preferred policy, as allocating the 
data to the cache reduces further bus traffic for subsequent references (loads or stores) or 
the data. rite allocate never occurs for data which is not cached A write allocate brings in 
the data immediately into the Modified state. 

Other "socket 7" processors have the ability to inhibit write allocate to cached locations 
under certain conditions, related by the address range. K6, for example, can inhibit write 
allocate in the range of 1 5- 1 6Mbytc, or for all addresses above a configurable limit with 
4Mbyte granularity. Pentium has the ability- to label address ranges over which write allocate 
can be inhibited 

Prefetch (PF) is an attribute that can be set on a LTB or GTB translation region to indicate 
that increased prefetching is appropriate for references in this region. Each program fetch, 
k>ad or store to a cache line that or does not already contain all the sub-blocks causes a 
prefetch allocation of the remaining sub-blocks. Cache misses cause allocation of the 
requested sub-bk*ck and prefetch allocation of the remaining sub-blocks. Prefetching docs 
not necessarily fill in the entire cache line, as prefetch memory references arc performed at a 
kjwer priority to other cache and memory reference traffic. A limited number of prefetches 
(as low as one in the initial implementation) can be queued; the older prefetch requests arc 
terminated as new ones are created. 

In other respects, the PF attribute is handled in the manner of the WA attribute. Prefetching 
is considered an implementation -dependent feature, and an implementation may choose to 
implement region with the PF attribute exactly as with the WA attribute. 

Implementations may perform even more aggressive prefetching in future versions. Data 
may be prefetched into the cache in regions that arc cachcable, as a result of p*'>gram 
fetches, loads or stores to nearby addresses. Prefetches may extend beyond the cache line 
associated with the nearby address. Prefetches shall not occur beyond the reach of the GTB 
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entry associated with the nearby address. Prefetching is terminated if an attempted cache fill 
results in a bus response that is not cacheable. Prefetches arc implem rotation-dependent 
behavior, and such behavior may vary as a result of other memory references or other bus 

activity. 

SubStrcam (SS) is an attribute that can be set on a LTB or GTB translation region to 
indicate tint references in this region are to be selected as the next victim on a cache miss. In 
particular, cache rri«*es, which normJly place the cache line in the last-to-be-victim state, 
instead place the cache line in the first-to-be-victim state, except relative to cache lines in the 
I state. 

In other respects, the SS attribute is Handled in the manner of the WA attribute. SubStream 
is considered an implementation-dependent feature, and an implementation may choose to 
implement region with the SS artnbutc exactly as with the WA attribute. 

The SubStrcam attribute is appropriate for regions which are large data structures in which 
the proccsor is likely to reference the memory data just once or a small number of times, 
but for which the cache permits the data to be fetched using burst transfers. By making it a 
priority for victimization, these references are less likely to interfere with caching of data for 
which the cache performs a longer-term storage function. 

LineStream 

I incStrcam (LS) is an attribute that can be set on a LTB or GTB translation region to 
indicate that references in this region are to be selected as the next victim on a cache miss, 
and 'o enable prefetching. In particular, cache misses, which normally place the cache line in 
the last-to-bc-victim state, instead place the cache line in the first- to- be- victim state, except 
relative to cache lines in the I r.tate. 

In other respects, the LS attribute is handled in the manner of the PF attribute. LineStream 
is considered an implementation dependent feature, and an implementation may choose to 
implement region with the SS attribute exactly as with the PF or WA attributes. 

I jke the SubStream attribute, the I incStream attribute is particularly appropriate for regions 
for wheh large data structures arc used in sequential fashion. By prefetching the entire cache 
line, memory traffic is performed as large sequential bursts of at least 256 bytes, maximizing 
the available bus utilization. 



/ 
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Cache Coherence 

Cache coherency is maintained by using MESI protocols, for which each cache line (256 
bytes) the cache data is kept in one of fbur states: M, E, S f I: 



State 




this Cache data 


other Cache data 


Memory data 


Modified 


3 


Data is held 
exclusively in this 
cache. 


No data is present 
in other caches. 


The contents of 
main memory 
are now invalid. 


Exclusive 


2 


Data is held 
exclusively in this 
cache. 


No data is patent 
in other caches. 


Data is the same 
as the contents 
of main memory 


Shared 


1 


Data is held in this 
cache, and possibly 
others. 


Data is possibly in 
other caches. 


Data is the same 
as the contents 
of main memory. 


Invalid 


0 


No data for this 
location is present 
in the cache. 


Data is possibly in 
other caches. 


Data is possibly 
present in main 
memory. 



The state is contained in the met i field of the cache tag. 

In addition, because the ••Socket T % bus performs block transfers and cache coherency 
actions on triclet (32 byte) blocks, each cache line also maintains 8 bits of triclct valid (tv) 
state. Each bit of tv corresponds to a triclet sub-block of the cache line; bit 0 for bytes 0..31, 
bit t for bytes 32.63, bit 2 for bytes 64..9S, etc. If the tv bit is zero (0), the coherence state 
for that triclct is I. no matter what the value of the meai field. If the tv bit is one (1), the 
coherence state is defined by the meai field. If all the tv bits are cleared (0), the meai field 
must also be cleared, indicating an invalid cache line. 

Cache coherency activity generally follows the protocols defined by the "Socket 7" bus, as 
defined by Pentium and K6-2 documentation. However, because the coherence state of a 
cache line is represented in only 10 bits per 256 bytes (1.25 bits per triclct), a few state 
transistions are defined differently. The differences are a direct result of attempts to set 
tridets within a cache line to different MRS states that cannot be represented. The data 
structure allows any triclct to be changed to the I state, so state transitions in this direction 
match the Pentium processor exactly. 

On the Pentium processor, for a cache line in the M state, an external bus Inquiry cycle that 
does not require invalidation (INV=0) places the cache line in the S state. On the Zeus 
processor, if mi other triclct in the cache line is valid, the meai field is changed to S. If other 
triclcts in the cache line arc valid, the meai field is left unchanged, and the tv bit for this 
triclct is turned off, effectively changing it to the 1 state. 

On the Pentium processor, for a cache line in the R state, an external bus Inquiry cycle that 
docs not require invalidation (INV=0) places the cache line in the S state. On the Zeus 
pr«cssor, the meai field is changed to S. If other triclcts in the cache line arc valid, the 
M RSI state is effectively changed to the S state for these other triclcts. 
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On the Pentium processor, for a cache line in the S state, an internal store operation causes a 
write-through cycle and a transition to the E state. On the Zeus processor, the meti field is 
changed to E. Other tridets in die cache line are invafidatcd by clearing the tv bits; the MES1 
state is effectively changed to the I state for these other tridets. 

When allocating data into the cache due to a store operation, data is brought immediately 
into the Modified state, setting the meal field to M If the previous meal field is S, other 
tridets which are valid arc invalidated by clearing the tv bits. If the previous mcai field is E> 
other triclets are kept valid and therefore changed to the M state. 

When allocating data into the cache due to a load operation, data is brought into the Shared 
state, if another processor reports that the data is present in its cache or the meai field is 
already set to S, the Exclusive state, if no processor reports that the data is present in its 
cache and the meai field is currently E or I, or the Modified state if the mcai field is already 
set to M. The determination is performed by driving PWT low and checking whether 
WB/VCT# is sampled high; if so the line is brought into the Exclusive state. (Sec page 202 
(184) of the K6-2 documentation). 

Strong Ordering 

Strong ordering (so) is an attribute which pcrnui* certain memory regions to be operated 
with strong ordering, in which aD memory operations are performed exactly in the order 
specified by the program and others to be operated with weak ordering, in which some 
memory operations may be performed out of program order. 

The Zeus processor controls strong ordering as an attribute in the LTB and GTB, thus 
software may set this attribute for certain address ranges and dear it for others. A one bit 
field indicates the choice of access ordering. A one (1) bit indicates strong ordering, while a 
zero (0) bit indicates weak ordering. 

With weak ordering, the memory system may retain store operations in a store buffer 
indefinitely for later storage into the memory system, or until a synchronization operation to 
any address performed by the thread that issued ihc store operation forces the store ro 
occur. I-oad operations may be performed in any order, subject to requirements that they be 
performed logically subsequent to prior store operations to the same address, and 
subsequent to prior synchronization operations to any address. Under weak ordering it is 
permitted to forward results from a retained store operation to a future load operation to the 
same address. Operations are considered to be to the same address when any bytes of the 
operation arc in common. Weak ordering is usually appropriate for conventional memory 
regions, which are side-effect free. 

With strong ordering, the memory system must perform load and store operations in the 
order specified. In particular, strong-ordered load operations are performed in the order 
specified, and all load operations (whether weak or strong) must be delayed until all previous 
srmng-onkrcd store operations have been performed, which can have a significant 
performance impact. Strong ordering is often required for memory-mapped I/O regions, 
where store operations may have a side-effect on the value returned by loads to other 
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addresses Note »hat Zeus has memory r/i .pned I/O, such as the TB, for which the use of 
stnng ordering :s essential to proper opcranon of the virtual memory system. 

The FAX BF* s:gnal in "Socket w is of importance in maintaining strong ordering When a 
wntc :s performed with the signal inactive, no further writes to F. or M state lines may mrcur 
untu the signal becomes active. Further details are given in Pentium documentation (K6 2 
d« icmentarion may not apply to this signal) 

Victim Selection 

One hit «• the cache tag. the vs hit, controls the selection of which set of the tout sets at a 
cache a. 1 tress should next he chosen as a victim for cache line replacement. Victim 
>e!ectn*.n vs :s an attribute associated with IX >C cache blocks. No vs hits are present in the 
ITBorliTB. 

There are two hexlets of tag information foi a cache line, and replacement of a set requires 
wnnne «»n!v one hcxlct. To update priority information for victim selection by v nting only 
ont hexlet. intomunon in each hexlet i« combined by an exclusive -or. It is r.hc natu*c o! the 
exclusive Mr funcnon thut aitenng either of the two hexlets can change 'he pnonty 
information. 

Full victim selection ordering for four sets 

Ihrr an 4 • ? 9 J ' = J4 possihlt ordrnnf>s of the four set?, whuh (an be completely encoded in as few as > 
p/;r J h t ;s .v. ttdtuiff highest pnonn. 2 bits Jbr second higlmt priority t bit for third-highest pnonty, undo 
»::< *r t r,ar <: ;>non.^ Dntdint rhn up per set and duplicating per hexlet with the exclnshr-or tcheme abvr 
which su^ests stmph keeping track oj t/je three-highest pnonty sets utth 2 btts 
?aJ>. mini f» h/3 tf.ud md thnr bits per set. 

\T*a r uai!\ * s hts mm tlr u.ur sets an combmrd to produce a 6-bit value: 

vsc — (vs J I I vs 2 f * (vs / j | vs 0 ) 

T»>r h-jivst pn«nD /or replacement is set vsCf f). tecond highest pnonty is set vsci\ third highest pnonty 
v ft %'SCij. and lm+tt pnont\ is rscf. /' vsa^vsctj) When the behest pnonty set is repLiced. it 
hr % r,»ir; />* ntu * st front) and the others an moird -v/>, iomputin^a new VSC b,: 

\sc— vsa. j* i-sct .y vscjji | | wcf .j 

U "h*tt r>:,U\im ft vsc f'T a I jn? Stream u Sub'tream replacement the pnorxft for repLicement is 
un* anj'd unit is anoth * set ihntmns the invalid Ml . M state, computing a new vsc A): 

vsc— mcsi vsa j' v*a yvsct. o - r f ' - cs ^vsc^ f vsci a \ | Wf„: 
(mcsi %'scs,j = // r* vsci o I I 
(mcsi vsc? j -!) ' vsc;.j\ \ vscij}\ \ wr^.j: 
isc 
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( ache flnshmg and invalidations can came cache lines to he cleared out of sequential order Flushing or 
invalidating a cache hne moves that set to highest priority. 1/ that set is already highest pr. r>. the vsc is 
unchanged. If the set was second or third highest or lowest priority, the vsc is changed to men* that set to 
highest pnoriry, moling the others down. 

v S c^(<f3=%W!j)or6=vsci..2)?V3Clj:r3C}..2l || (fs=vsct„o ? rscij : vsct„o) II & 
When updating the hexlet containing vs J and vs 0 . the new values oj V9:V ar. 9 vs 0 are: 

vs 1 vs 1 A vsc$ m j 

\s0 — vs 2 A VSC2..0 
When hpdatmg the htxlr* containing vs } and vs 2 , the new valnes ofvs;); and vs2i are: 

vs 3 •"•*/■' A vsc^ . * 

%'s 2 •-- i s 0 A vscj. o 

Software must :nitiah^t the vs hits to a legal consistent state, f or example to set the priority (highest to 
wwrst) to f0. /. 2. h. vsc must he set to Oh 1 W 100. There are many leg/'l solutions that yield this vsc 
mine, stub as vs * — f K %s 2 0. »-* ' +- 4.vs0 — 4. 

Simplified victim select ion ordering for four sets 

However, the ordering* are simplified in the first /-cus implementation, to reduce the 
number * *( vs hits t«» one per set, keeping a rwo btt vsc state value: 

v.c-(v*I3j || v»|2|r (v»|l| || v»|0]) 

I"hc highest prior ty tor replacement is set vsc, second highest priority is set vsc+1, third 
highest priority :s set vsc~2, and lowest priority is ^"hen the highest priority set is 

replaced, it becomes the new lowest priority and the others are moved up. Priority is given to 
sets with invalid state, computing a new vsc by: 

vsc«— mcsi|vsc*1] = I) ?vsc + I : 
(mcsi'vsc^l^I) ? vsc + 2 : 
(mcsi|vsc*3|=I) ? vsc + 3 : 
vsc + 1 

When replacing set vsc tor a IjncStrcam or SuhStream replacement, the priority tor 
replacement is unchanged, unices another set contains the invalid MRS1 state, computing a 
new vsc by: 

vsc «— mcsi|vsc^1) = I) ? vsc + 1 : 
(mcsi|vsc + 2|=I) ? vsc + 2 : 
(mcsi|vsc*3|=I) ? vsc + 3 : 
vsc 
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Cache flushing and invalidations can cause cache sets to be cleared out of sequential order. If 
the current highest priority for replacement is a valid set, the flushed or invalidated set is 
made highest priority for n*placcmcnt. 

vac «- (mcsi(vsc]=I) ? vac : fs 

When updating the hcxlet containing vs|l j and vt|0], the new values of va|lj and vt|OJ are: 

vs;i| «- vs|3) A vac] 
v«|0j «— vt|2| A V*CQ 

\XTicn updating the hcxlet containing vs|3) and vs|2), the new values of vs|3j and vs|2] are: 

V8|3J V8|l J * V8C| 
vs|2J — V8|0) * V8CQ 

Software must ininahzc the vs bits, but any state is legal. For example, to set the priority 
(highest to lowest) to (0, 1,2. 3), vac must be set to ObOO. ITicrc arc many legal solutions that 
yield this vac value, snch as vs|3] «- 0, vs|2| 0, va|l) «- 0,vs|0] «- 0. 

Full victim selection ordering for ad ditional sets 

To extend the fultnctim-ordering scheme to eight sets, 3*7-21 bits are needed, which divided among two 
tags ts // bits per tag. This is somewhat generous, as the minimum required is X*7*6*$*4*J*2*1 =40)20 
ordering, which can be represented in as few as 16 bits. Extending the fullrictim -ordering four- ut tcheme 
abotr to represent the fint 4 priorities :n binary, hut to use 2 bits for each of the next i priorities requires 
5- )+i + )+2+2+2 = IK bits. Representing feurr distinct ordenngs can further reduce the number of bits 
used. As an extreme example, using the simplified scheme abotr with eight sets requires only J bits, which 
divided among two tags is 2 bits per tag. 

Victim selection without L OC faq bits 

At extreme values of the niche limit register (nl in the range 121 .124), the bit normally used 
to hold the va bit is usurped for use as a physical address bit. Under these conditions, no vac 
value is maintained per cache line, instead a single, global vac value is used to select victims 
for cache replacement. In this case, the cache consists of four lines, each with four sets. On 
each replacement a new si valus is computed from: 

/ «- gvsc + 1 

«*-RVK~|M||..|0 

The algorithm almvr is designed to utilize all four sets on sequential access to memory. 
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At even more extreme values of the niche limit tcicistr r (nl in the range 125.. 127), not only 
is the bit normally used to hold the v» bit is usurped hw use as a physical address bit, but 
there is a deficit of one •* two physical address bits. In this case, the number of sets can be 
reduced tit encode physical address bits into ihc victim selection, allowir % the choice of set 
to indicate physical address bits 9 or bits 9Jt. On each replacement a new vac valus is 
computed from: 

gvtc — gvac + I 
•i — pa<; | | (nl= 12") ? pan : gvtc^pajo 

JTic Jgonthm above is designed to utilize all four sets on sequential access to memory. 

Detail Access 

Detail access is an attribute which be set on a cache block or translation region to 
indicate that software needs to be consumed <»n each potential access, to determine whether 
the access should proceed or not. Setting this attribute causes an exception trap to occur, by 
which software can examine the virrual address. W for example, locating data in a table, and 
it indicated, causes the processor to connnuc execution. In connnuing, ephemeral state is set 
upon returning to the re execunon of the instrucoon that prevcr.*, the execpnon trap from 
recurring on this particular re exception only "l*hc ephemeral sr is cleared as soon as the 
instruction is either completed or subject to another exception, so IVtailAcccss exceptions 
can recur on a subsequent execunon of the same in 'miction. Alternatively, if the a. ress is 
not to proceed, execunon has been trapped to software at this point, which can abort the 
thread or take other conrechve action. 

\hv detail access attnbute permits specinxanon of access parameters over memory rcg-.m on 
arbitrary bur boundaries. This is important for emulators, which must pa-vent store access 
to code which has been translated, and for simulanng machines which have byte granularity 
on segment boundancs. Trie detail access attribute can also be applied to debuggers, which 
have the need to set breakpoints on byte level data, or which may use the feature to set code 
breakpoints on instrucnon boundancs without altering the pn>gram code, enabling 
breakpoints on code contained in ROM. 

A one Ivt field indicates the choice of detail access. A one (1) bit indicates detail icccss, while 
a zero (Oj bit indicates no detail access. D.tail access is an attribute that can ! set by the 
I /IH, the (»*|"B, or a cache tag. 

"I*he table below indicates the proper status for all potential values of the detail access bits in 
the I.TB.criUand Tag: 
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LTB 


GTB 


Tag 


status 


0 


0 


6 


OK • normal 


0 


0 


1 


Access0etalRequired8yTag 


0 


t 


0 


AccessDetatfRequiredByGTB 


0 


1 


1 


OK • GTB inhibited by Taq 


1 


0 


0 


AccessDetaiRequlredByLTB 


1 


0 


1 


OK • LTB inhibited by Taq 


1 


1 


0 


OK • LTB inhibited by GTB 


j 1 


1 


1 


AccessOetaifRequiredByTaq 


0 


Miss 




GTBMiss 


1 


Miss 




AccessOetailRequiredByLTB 


0 


0 


Miss 


Cache Miss ~t 


0 


1 


Miss 


AcccssOetaifRequiredByGTB j 


1 1 


0 


Miss 


AccessOetaiiRequiredByLTB 




Miss 


Cache Miss 



The first eight nws sJhiw appropriate activities when all three bits are available. "Hie detail 
access attributes for the LTB, GTB. and cache tag work tt>gcthcr tt» define whether ami 
which kind of detail access exception trap occurs. Generally, setting a single attribute bit 
causes an exception, while setting two bits inhibits such exceptions. In this way, a detail 
access excepnon can be narrowed down to cause an exception over a specified region of 
memory: Software generally will set the cache tag detail access bit only for regions in w hich 
the LTB or GTB also has a detail access bit set. Because cache activity may flush and refill 
cache hnes impliciry, it is not generally useful to set the cache tag detail access bit alone, but 
if this occurs, the AcccssDcrailRcquircdByTag exception catches such an attempt. 

The next two rows show apptopropnatc activities on a GTB miss. On a GTB miss, the 
detail access bit in the GTB is not present. If the LTB indicates detail access and the GTB 
rruscs, the AcccssDctatlRcquiredByLTO exception should Ik indicated. If software 
continues from the AcccsslX tail Required B\ I /I^B exception and has not filled in the GTB, 
the GTBMiss execpoon happens next. Since the GTBMiss exection is not a continuation 
exception, a re execution after the GTBMiss exception can cause a rcoccurcncc of the 
AcccssIXtail Required By I /IB exemption. Alternatively, if software continues from the 
AcccssDctailRcquircdByLTB exception and has filled in the GTB, the 
AcccssDctailRcquircdByLTB exception is inhibited for that reference, no matter what the 
status of the GTB and Tag detail bits, but the re executed instruction is still subject to the 
AcccssDctaiiRcquircdByG'I'B and AcccssIX-tailRcquircdByTag exceptions. 

The last four rows show appropriate activities for a cache miss. (>n a cache miss, the detail 
access bit in the tag is not present. If the LTB of GTB indicates detail access and the cache 
misses, the AcccssDctailRcquircdByl/Ilt or AcccssDctailRcquircdByGTB exception should 
be indicated. If software continues from these exceptions and has not filled in the cache, a 
cache miss happens next. If software continues from the AcccssIXtailRequircdByLTB or 
AcccssIXtailRcquircdByG'ID exception and has filled in the cache, the previous exception is 
inhibited for that reference, no matter what the status of the Tag detail hit, but is still subject 
to the Accc**I)etaJRcquircdByTag exception. When the detail bit must \k orrated from a 
cache miss, the intial value filled in is *cn>. Software may se: the b,t, thus turning off 
A cccssIXtailRcq.il red exceptions per cache line. If the cache line is flushed and refilled, the 
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detail access bit in the cache tag is again react to xcru, am! another Access DctaitRcquircd 
exception occur*. 

Settings of the niche Kmtt parameter to values that require use of the da bit in the I XXI tag 
for retaining the physical address usurp the capabib y to set the Tag detail access bit. Under 
such conditions, the Tag detail access mt is effectively always ?cro (0) # so ii cannot inhibit 
Acccssl)etailRcquircdByI/I'B. inhibit AcccssDctailRcquirtdByGTB, or cause 
AcccssDctailRctjuircdByTag. 

Ili': execution of a Zeus instruction has a reference to one quadlct of instrucaon, which may 
Ik subject to the DctailAcccss excepoons, and a reference to data, which may be unaligned 
or wide ITicsc unaligned or wide references may cross GTB or cache boundaries, and thus 
involve mulnplc separate reference that are combined together, each of which may be 
subject to the Detail Access exception. There is sufficient infcirmabon in the DctailAcccss 
execpnon handler to process unaligned or wide references. 

The impltmcntanon is free to indicate Detail Access excepoons for unaligned and wide data 
references cither in combined form, or with each sub-reference separated, For example, in 
an unaligned rcurcncc that crosses a GTB or cache boundary, a DctailAcccss excepoon may 
Ik- indicated f« »r a portion of the reference. The exception may report the virtual address and 
sue of the complete reference, and upon continuing, may inhibit reoccurrence of the 
I Vtail Access execpnon for any portion of the reference. Alternatively, it may report the 
virtual address and size of only a reference portion and inhibit ^occurrence of the 
DruilAecess execpnon f«»r only that portion of the reference, subject to another 
DctailAcccss excepoon <KTcumng for the remaining portion of the reference. 

Micro Translation Buffer 

I Tie Micro Translation Buffer (MTB) is an impltmcntanon dependent structure which 
reduces the u cess traffic to the GTB and the I.Of. tagv iTie MTB contains and caches 
information read tn>rr the CiTB and K()C tags, and is consulted on each access to the IXXZ. 

To access the IX >C, a global address is supplied to the Micro Translation Buffer (MTB). 
which associativeK looks up the global address into a table holding a subset of thr lA KT tags. 
In addinon. each table entry contains the physical address bits 14 .8 f 7 bits) and set identifier 
(2 bits) required to access the I XXI data. 

In the first Zeus implement anon, there are two MTB blocks • MTB 0 is used for threads 0 
and I, and MTB I is used tor threads 2 and 3. Per clock cycle, each MTb block can check 
tor 4 simultaneous references t<; .he 1A)C Kach MTB block has 16 entries. 

F.aeh MTB entry consists of a bit less than 128 bits of information, including a 56-b:t globil 
address tag, 8 bits of privilege level required for read, wnte, execute, and gateway access, a 
detail bit, ami 10 bits of cache state indicanng for each mclet ^2 bytes) sub-block, the MESI 
state. 
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Match 

*3 67 43 0 

I g i I 1 



56 



The output of the MTB combines physical address and protection information fn»m the 
GTB and the referenced cache line. 
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The meaning of the fields ire given by the following table: 



name 


SBC 






56 


global address 


Qi 


T 


GTB index 


a 


7 


cache index 


si 




set index 


vs 


12 


victim select 


da 


• 
1 


detail access (from cache line) 


mesi 


2 


coherency: modified (3). exclusive (2). shared (1), invalid (0) 


tv 


8 


tnclet valid (1) or invalid JO] 


9 


2 


minimum privileqe required for gateway access 


X 


2 


minimum privilege required for execute access 1 


i Ai 

w 


c 


minimum privilege requffeo ror write access 


r 


2 


minimum privileqe required for read access 


0 


1 


reserved 


da 


1 


detail access (from GTB) 


so 


1 


stronq ordering 


cc 


3 


cache control 



\K ith an MTB hit, the resulting cache index (14..8 from the MTB, bit 7 from tiie I-A) and set 
i>icntificr (2 bits from the MTB) are applied to the IX>C data bank selected from bits 6. 4 of 
tlic GVA. The access protection information (pr and rwxg) is supplied from the MTB. 

W ith an MTB (and BTB) miss, a victim entry is selected for replacement. The MTB and 
ITITJ arc always clean, *> the victim entry is discarded without a writeback. The GTB 
(Jtscnbcd below) is referenced to obtain a physical address and protection information. 
IX pending on the access information in the GTB, either the MTB or BTB is filled. 

Norc that the processing of the physical address pa|4..g against the niche limit nl can be 
performed on the physical address from the GTB. producing the IX >C address, ci The IXK 
address, after processing against the nl is placed into the MTB directly, reducing the latency 
ofanMTBhit. 

Tour tags are fetched from the IX*C tags and compared against the PA to determine which 
of the four sets contain the data. If one of the four sets contains the correct physical address, 
a victim MTB entry is selected for replacement, the MTB is filled and the L(K access 
proceeds. If none of the four sets is a hit, an IX)C miss occurs. 

MTB miss GTB cam IX HI tag MTB fill 

MTB victim 

I XXI miss 

The operation of the MTB is largely not visible to software • hardware mechanisms are 
responsible for automatically ininalizing, filling and flushing the MTB. Activity that modifies 
the GTB or 1.1X1 tag state may require that one or more MTB entries arc flushed. 
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A write to the GTBLpdatc register that updates a matching entry, a wntc to the 
GTBlpdatr!ifl register, or a direct write *> the GTB aO flush relevant entries from the 
MTB. MTB flushing is accomplished by searching MTB entries for values that match on the 
IP field with the GTB entry that has been rrvxiificd. Each such matching MTB entry is 

flushed. 

The MTB is kept synchronous with the IX)C tags, particular!)' with respect to MFSI state. 
C>n an IXXZ mir > or IXK snoop, any changes in MFSI state update (or flush) MTB entries 
which physically match the address. If the MTB may contain lev than the full physical 
address: it is sufficient to retain the IjCXT physical address (ct 1 1 v 1 1 st). 

Block Translation Buffer 

Zeus has a per thread "Block Translation Buffer 9 * (BTB). The BTB retains GTB information 
ft* uncached address blocks. The BTB is used in parallel with the MTB - exactly one of the 
BTB or MTB may translate a particular reference. When both the BTB and MTB miss, the 
GTB is consulted, and depending on the result, the block is filled into cither the MTB or 
BTB as appropriate. In the first Zeus implementation, the BTB has 2 entries for each thread. 

BTB entries cover any powcr-of-two granularity, as they retain the size information fiom the 
GTB BID entries contain no MFSI state, as they only contain uncached blocks. 

Each BTB entry consists of 128 bits of information, containing the same information in the 
same format as a GTB cmrv. 

Niche b'ocks arc indicated by GTB information, and correspond to blocks of data that are 
retained in the I XX. and never miss. A special physical address range indicate* niche blocks 
For this address range, the BTB enables use of the IX)C as a niche memory, generating the 
"set select" address hits from low-order address bits. There is no checking of the IX X. tags 
for consistent use of the LOT as a niche - the nl field must be preset by software so that 
I XXI cache replacement never claims the IX )C niche space, and on! BTB miss and 
protection bits prevent software from using the cache portion of the IX >C as niche. 

Other address ranges include other on-chip resources, such as bus interface registers, the 
control register and status register, as well as off-chip memory, accessed through che bus 
interface. Each of these regions are accessible as uncached memory. 

Program Translation Buffer 

I-attr implementations of Zeus may optionally have a per thread "Program Translation 
Buffer" (PTB). The PTB retains GTB and IX X cache tag information The FI*B enables 
generation of IX X. instruction fetching in parallel with load/store fetching. Ihe PTB is 
updated when instruction fetching crosses a cache line boundary (each 64 instructions in 
straight line code). The PTB functions similarly to a one -entry MID, but can use the 
sequential nature of pn»gram c«idc fetching to avoid checking the 56-bit match. The PTB is 
Hushed at the same time as the MTB. 

The imnal implementation of Zeus has no PTB • the MTB suffices for this function 
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Global Virtual Cache 

The initial implementation of Zeus contains cache which is both indexed and tagged by a 
physical address. Other prototype implementations have used a global vitual address to index 
and/or tag an internal cache. TTiis section will define the required characteristics of a global 
vi tuaDy indexed cache. IX )EX ) 

Memory Interface 

Dedicated hardware mechanisms are provided to fetch data bl«ks in the levels zero and one 
caches, provided that a matching entry can be found in the MTB or GTB (or if the MMU is 
disabled). Dedicated hardware mechanisms arc provided to store back data blocks in the 
level zero and one caches, regardless of the state of the MTB and GTB. When no entry is to 
be round in the GTB, an exception handler is invoked cither to generate die required 
information fnm the virtual address, or to place an entry in the GTB to provide for 
automaoc handling of this and other similarly addressed data bkxks. 

The initial implementation of /xus accesses the remainder of the memory system through 
the -picket 7 M interface. Via this interface. Z^us accesses a secondary cache, DRAM 
memory, external ROM memory, and an I/O system TL* size and presence of the secondary 
cache and the DRAM memory array, and the contents of the external ROM memory and the 
I/O system are vanables in the pn xcssor environment 

Microarchitecture 

Kath thread has two address generanon units, capable of producing two ahgncd, or one 
unaligned load or store operation per cycle. Alternatively, these units may produce a single 
load «»r store address a.id a branch target address. 

Kach thread has a LTB, which translates the two addresses into global virtual addresses. 

Fach pair of threads has a MTB. which lotJcs up the four references into the LOC. The PTB 
provides for additional references that are program code fetches. 

In parallel with the MTB, these four references arc ctKnbined with the four references from 
the other thread pair and partitioned into even and odd hexlet references. I'p to four 
references are selected for each of the even and odd portions of the 1-ZC. One reference for 
eacli of the eight banks of the LOC (fiHir ate even hcxlets; four are odd hexlets) are selected 
from the eight load/ store /branch references and the PTB references. 

Siwnc references may lie directed to both the I-ZC and LOC, in which case the 1-ZC hit 
causes the LOC data to be ignored. An I-ZC miss which hits in the MTB is filled from the 
LOC to the 1.ZC. An iy.C miss which misses in the MTB causes a GTB access and LOC 
tag access, then an MIH r tll and LOC access, then an I.ZC fill. 

Prinnty of access: (highest /lowest) cache dump, cache fill, load, program, store. 
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Snoop 

The "Socket T* bus requires certain bus accesses to be checked against on chip caches. ()n a 
bus read, the address is checked against the on-chip caches, with accesses aborted when 
requested data is in an internal cache in the M state, and the F. state, the internal cache is 
charged to the S state. On a bus write, data written must update data in on-chip caches. To 
meet these rco^rcmenrs. physical bus addresses must be checked against the L()C tag* 

The S 7 bus requires that responses to inquire cycles occur with fixed riming. At least with 
certain aimbtnations of bus and pnicessor clock rate, inquire cycles will require top priority 
to meet the inquire response riming requirement. 

Synchnxuzaooti operations must take into account bus activity - generally a synchn miration 
operation can only proceed on cached data which is in Exclusive or Modified - if cached 
data in Shared state, ownership must be obtained Data that is not cached must be accessed 
usmg kicked bus cycles. 

Load 

I»ad operations require parationing into reads that do not cross a hcxlct (128 btri boundary , 
checking for store conflicts, checking the I-ZC. checking the IX )C, and reading from 
memory. Execute and Gateway accesses are always aligned and since they are r-malicr than a 
hcxlct. do not cross a hcxlct boundary. 

Note: S7 processors perform unaligned operations I-SB first. MSB last, up to t>J bits ai a 
time. Unaligned 128 bit loads need 3 64-bit operations, \SB % octlct, MSB. Tranaicrs whit h 
arc smaller than a hcxlct but larger than an octlct arc further divided in the S7 bu> jmt. 

def data #- LoadMemoryXfDa.la.we.order) 

assert forder « U and f(la and fwe/8-l|) = 0) and (we = 32) 

hdata Tramfc«e^rvX^che^cce«{ba.la.stfeXO) 

data «- hdata3i«^pa and 15) 8fta and IS* 
end^ef 

def data «- U>adMemoryG(baJa.saie.order) 

assert forder * g and ||ta and | we/8- Ml = Of and (we = M) 
rdata TransJate^xJCache^cessftu.la.we.G.O) 

data »&*a<>3+9Tb and ?5| 8fu> and I S| 
erx ten 

def data «- LoadMemoryfba.la.wroroer) 

4 (we > 128) then / 
dataO «- LoadMernoryfba. la.we/2. order) 
data? 4- LoadMrrnoryfba. «a*(we/2). we/2, order) 
case order of 
I: 

data «- data) 1 I dataO 

a 
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data «~ dataO 1 1 data I 

endcase 

5s «- 8134 o 

be 4- bs ♦ $tre 

ir be > 128 trier. 

dataO LoaoMcmoryjoa. U 128 - bs. order) 
data* *~ LoddMcmoryfba, (la 6 3. 5 ♦ 1) 1 1 0 4 . be - 128. order) 
case order of 
L: 

data #- 'datal 1 1 dataO) 

ft 

data «- (datoJ I I datal) 

ondcase 

♦*se 

hdata 4- Trvislate^idCacheAccess(ba.la.size.R.O) 
fori'- 0 to sue-8 by 8 

j 4- t» ♦ florder=y ? 1 : stfe-8-4 

datV7 j «- "^ata^r 
ene*or 

<mdrf 

endrf 
endJef 

Store 

Sc-uc <»pcrations requires partici wing inn> stores less than 128 bits that do not cross hexJet 
boundaries, checking for store o inflicts, checking the I-ZC, checking the I -DC, and storing 
into memory. 

Def kntion 

def StcreMenmxytbila.sue.order.data) 
bs 4- 8*1*4 0 
be bs * size 
* be > 128 mer> 
case orcc; of 
L: 

dataO 4- data 1 p 7.^ 0 
datal 4- data^i ,28-bs 

B: 

dataO data^, ^,28 
datat 4- dataoe.129 0 

endcase 

StoreMemoryfba. la. 128 - bs. order. dataOj 1 / 

StoreMemoryfba. (1353 .5 • 1) 1 1 0 4 . be • 128. order, datal) 

else 

for 1 4— 0 to sue-8 by 8 

j 4- bs ♦ ({order* ? i : size-8-4 

hdata r7 j <***+7J 
endfor 

xdata 4- TransiateAodCache^cess(ba la. sue. W. hdata) 



-360- 



MkroLniry 



7xu» System Architecture Tuc. Aug 17. 1999 Memory \taiutfcment 



Memory 

Memory operations require first translating xia the ITB and GTB. checking for access 
exceptions, then accessing the cache. 

QS&MfiQD 

f ControWt9«tef 52 then 
case rwxg of ' 

at 4- 0 

W: 

at 4- I 

X 

at 4- 2 

G: 

at «- 3 

endcase 

rw 4- frwxg=W) ? W : R 

ga.tecatffrotcct 4- LocaiTranstatwi(tn,t».ta.pf 

* Lor^Protect^^at a*7%< < P* then 

raw AccessOisaJlowedBylTB 

endf 

Ida LocalProtect4 

pa.Gioba#Votect «- Gtoha^ranjlatic^th.oa^kJal 

* Gtoc*atfVofect^2 'at B+2 'at < P* *hen 
ra*e ^ccess£>sa«owed6/GT8 



cc 4- (Loca*Votect2 o > Gto6aiProtect 2 oi ? Loca«Protect 2 o Gto6alPr;>rect 2 0 
so «- locatfYotectj or GJo6alProtect 3 
gda GJobatfYotecu 

hdataJagProtect 4- Le*eOc>eCacr>e^c«s(pa.si*e.kl^ 
# Ma * gda - TagProtectj * I then 
Tag^otect then 

Perforrr^cessDetailf*ceK^^ 
else*/ gda then 

Pen\xm*ce*sDr*aW<*:c ssOeta^Re<^ed8yGk>oafTaj 

else 

PerforrnVce«Oetaf^cessOetatfRea^ 

endW 

end* 

e#se 

case rwxg of 
* X G: 

^ hdata «~ ffeadPhysica<(la.wej 
Wme*y*KaJflajtfe.hwdata) 

endcase 
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Bus interface 

The mioal implementation c.f the Zeus proce^oc uses a "Super Socket 7 comparib.V (5S7) 
bua mwte, whKh n genenUy similar to and compatible with other "Socket 7- and "Super 
^„\T"^"*^ , ~ cl ' M ^ ,n «' Pentium with MMX Technology; AMD K6. 

ustd f bdS ^ ^ ^ * ^ ^ 6 " 86 ' a " d M,hcf " S ° Ck " 7 " ***** 

^ 'nrerface behavi.* i s ^,tc complex, but well-known due to the Icadin* position 
of the Intel Pentium d»i*n. This document doc not ycx c.mtain all the detailed informs 
rcUtcd I to this bus. and wiD eweentrate on the differences bcr^ccn the Zeus SS7 bus and 
other des.p£ For functional specification and pin interface belabor, the Pmtim Pmnor 

A*-.? |V»«r Data Sbitf* is a pnmary reference 



Motherboard Chipset 

The following motherboard chipsets arc designed f. 



For the 100 Ml Iz "S«ket 7" bus: 



Manufacturer 

VIA cecrvxtfogwi. inc. 
Mcon fruegratetf Synerra 
Acer Laboratories, tnc 



Website 

www.via.com tw 
www wcomiw 
www^cettao^com 



Chipset 

ApoSo MVP3 
S«S 5591/5 592 
A* Atodcfin V 



clock North South 

rate bridge bridge 

100 MHz v^ttcS**-.** vt82cS98b 

75 MHz S59|*> SiS SS9S 

t°° "Hz MI54l" M,5«3C 



The fuJkiwing prwcessoo are designed for a "Socket 7" bus: 



Manufacturer 

Advanced Mkto Devices 
Advanced Mfcro Devices 



rDT/Centaur 
©T/Centaur 
OT/Cencabr 
©T/Centaur 
NSAVCyriji 



Website 

www.amd com 
www.anxJ.corn 
ww ^f.mtef .c om 
www. wincniu com 
•*ww.vwncnjfiL com 
www. m Hp.com 
www. wmcfvpL con » 
wwwcyng.com 



Chips 

K6-2 
K6-3 

Penoum MMX 
wmchtp C6 
wmctvp 2 

wlncNp 2A 
Wincrtrp 4 



dock rate 

100 MHz 
100 MHz 
66 MHz 
75 MHZ 
100 MHZ 
100 MHz 
100 MHZ 



- http ''h^^im.nit^com^^ 

hftp 7/W trucmuni?ycnm/-ct^»i^ 

hftp / /hum niciminity com/ -cn^»t«iutonJ,/,cer/ibdS,5ph.him 
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Bus in .erfacc 



Pinout 



In the disKran, bc»,^. Mpuh whxh arc diflFerent fn*n Pentium pinout. are indicated by 
S3 SLS^-"*- *'dier Pen^-c^ ^ (such . ^ ^ 



vc v: 

AM At 



A.0 VCC1 VCCI -TCI VCCI WCCI VCC2 VCC2 VtO v<r, VCM VCC2 UVK ^ ^- 
■ WVBVBVBVBVB VBVBVBVBVB VB */«r £AOi# AOJC. 
A.. A., AM A,4 AIO NC JOC M. «« «, «, ^ m ^ , 



AG 

Af 

AD 



AA 



.u* a»t nn cm ■» M «» «„ V om, ck # ^ 

AOS* MX* MfO | 

:ocu vn 

pTD VLCjl 



'A 
v 
o 

I 

» 

O 
P 
N 
U 

I 

) 

H 

o 
f 

I 

o 
c 
e 



Art A?* AS At All 

»B Aft All 
W A** flflr 
VCCI A?4 AJf 
VU A?> 

vTCj ttfr V3 

Vtt MTff *°«" 
«*C1 tV SM /SS 

VTCI »ct HOLD VB 

VB «J 

*U »i VC«| f 

vrr » agj ^ 4WDW v» 

v\s Wim VCC2 

AO V\S VCCI 

w, ^c, VCC2|u 

i \r *r 

WC *" WCCJ S S 

re • fa roo **** v** 

o* 7 °** vco|* 

•tcj p*ot vrc i vs 

w> oo oao o»j vcc?|t 

vci i d; p»c oo °** 

VCCI Dl OJ 054 VSS 

d« o-» 051 0,4 wcc?|o 

VTS A/% AIT OPS Oil 0F» 

rw r~ 042 0«« D«« 3V? DM 

• r> °" °" * °" "» °» » »•» OH OH OS. ^, C~r^, 

on O" o.. D« ^ w« V» VB v« v» V« V» « VB VB « o.. *C I. 
* °' S 0 " W; ^° ^' ^» vec VCC. VC« VC« VCO VC« V^C, VCC, 0.. ~c I 



MC I C 



* n 2i i* n i* it ii } , n it ii dh hmhiumm i ) m ) > P 



Pin summary 



A20M* 



A31.A3 



IO 



Address bit 20 Mask is an emutator aor^T 



Address, in combination with byte enable, indicate 
the ohysical addresses of memory or device that is the 
target of a bee transaction. This signal is an output 
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Bus interface 

Pm MOTvn&ry 







t » A. j > mm >4» o T"f> rnf rnr * — ■ I'll ■ > n mw 

wnen tne processor is inroaong ine dus iransacuon, 
and an input when the processor is receiving an 
inquire transaction or snooping another processor's bus 
transaction. 


AOS* 


(O 


ADdress Strobe, when asserted, indicates new bus 
transaction rjy tne processor, witn vana aaaress ana 
byte enable simuftaneousfy driven. 


ADSC* 


O 


Address Strobe Copy is driven idenbeaffy to 
address strobe 


AHOLD 


1 


Address HOLD, when asserted, causes the processor 
to cease driving aaaress ana aaaress parity in me 
next bus clock cycle. 


AP 


KD 


Address Parity contains even parity on the same 
cyde as address. Address parity is generatea oy 
the processor when address is an output and is 
cnecxec wncn aoaress is an mpui. r\ parity error 
causes a bus error machine check. 


APCHM 


O 


Address Parity CHecK is asserted two bus clocks 
after EAOS# if address parity is not even parity of 

Mddf^SS. 


/vTCcN 


■ 
1 


Aovanceo r*r ogrammaoie interrupt controller 
ENable is not implemented. 


BE7# BE04 


K> 


Byte Enable indicates which bytes are the subject of 
a reaa or write uansacDon ana are urrvcTi on intr 
same cycle as address. 


DC * ocn 
Br 1 ..or U 


1 


pus frequency is sampieu to permu >oiiw<irc iu 
select the ratio of the processor clock to the bus clock. 


BOrF# 


■ 
1 


BacK Urr is ssmpkeo on cne rising euge or eacn vu* 
clock, and when asserted, the processor floats bus 
signals on the next bus clock and aborts the current 
DUS cycle, unui uie DacKon signal d sarnymcu rieviaieu. 


BP3-BP0 


o 


Breakpoint is an emulator signal. 


BROt* 


1 

1 


Bus KeaQT indicates tnat vamJ data is present on 
data on a read transaction, or that data has been 
accept eo on a wrrte transaction. 


BRDYC* 


1 


Bus ReaOY Copy is identical to BROY#. asserting 

#Httw*r cifinaf Hat ftw» ump rffr*r? 

WiUlvi JfVji VOI (H3J If PC Ml 1 Pt Wllv^k 


BREQ 


o 


Bus REQuest indicaccs a processor initiated bus 
request 


BUSCHK# 


1 


BUS CHecK is sampled on the rising edge of the bus 
clock, and when asserted, causes a bus error machine 
check 


CACHC# 


o 


CACHE, when asserted, indicates a cacheable read 
transaction or a burst write transaction. 


cue 


1 


bus CLocJC provides the bus clock timing edge and 
the frequency reference for the processor clock. 


CPUTYP 


1 


CPU TYPe. if low indicates the primary processor, if 
hiqh. the dual processor. 
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D/C* 


1 


Data/Code rs driven with the address signal to 
indicate data. code, or special cycles* 


D63..D0 


IO 


Data communicate? o4 bits of data per bus clock. 


D/P* 


O 


Dual/Primary is driven {asserted, low) with address 
on the primary processor 


DP7..DP0 


IO 


Data Parity contains even parity on the same cycle 
as data. A parity error causes a bus error machine 
check. 


OPEN* 


IO 


Dual Processing Enable is asserted (driven tow) by 
a Dual processor at reset and sampled by a Primary 
processor at the faffing edge of reset 


EADS* 


1 


External Address Strobe indicates that an external 
device has driven address for an inquire cycle. 


EWBE* 


1 


External Write Buffer Empty indicates that the 
external system has no ptndinq write. 


FERR* 


O 


Floating point Eftftor is an emulator siqnal. 


FLUSH* 


1 


cache FLUSH is an emulator siQnal. 


FRCMC- 


1 


Functional Redundancy Checking 
Master/Checker is not im&emer tcJ. 


HIT* 


IO 


HIT indicates that an inquire cycle or cache snoop hits 
a vabo line. 


HUM* 


IO 


HIT to a Modf led line indicates that an inquire 
cycle or cache snoop hits a sub-block in the M cache 
state. 


HLCVK 


o 


bus HoLD Acknowlege is asserted {driven high) to 
acknowfege a bus hold request 


' HOLD 


' 


bus HOLD request causes the processor u. float 
most of its p«ns and assert bus hold ackruwlege 

after completing all outstanding bus transactor* or 
&jrinq reset 


IERR* 


o 


Infernal ERRor is an emulator uqnat. 


IGNNE* 


1 


IQNore Numeric Error is an emulator siqnal. 


INfT 


1 


INITIallzatlon is an emulator siqnat. 1 


IISTTR 


1 


maskable INTeRrupt is an emulator signal. 


i 


1 


IIMValldatlon controls whether to invalidate the 
addressed cache sub- block on an inqure transaction. 


KEN* 


1 


Cache ENable is driven with address to indicate 
that the read or write transaction is cacheabie. 


L.'NTI LINTC 


1 


Local INTerrupt is not implemcntec. 


LOCK* 


o 


bus LOCK is driven starting with address and 
ending after bus ready to indicate j locked series of 
bus transactions. 


M/tO* 


o 


Memory/Input Output is driven with address to 
indicate a memory or I/O transaction. 


NA* 


1 


Next Address indicates that the external system will 
accept an address for a new bus cycle in two bus 
clocks. 


NMf 


1 


Non Maskable Interrupt is an emUator sk?na!. 



